Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.11
Component/s: Networking / runtime-cfg
Labels:
None

Severity:
Important
Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Description of problem:

Let's say we start with a 4.10 a cluster where some nodes have both infra and worker roles (or master and worker roles) while others only have the worker roles.  The infra nodes are managed by their own infra mcp, which inherits the default worker machineconfigs (in the supported way).

In such a cluster, if we upgrade to 4.11, upgrade completes but the keepalived-monitor never performs the migration, because it mistakenly believes that the migration has never ended. We can confirm with error messages like the following:

time="2023-06-23T14:40:00Z" level=info msg="Failed to retrieve upgrade status or Upgrade still running" err="<nil>" upgradeRunning=true

If we temporarily remove one of the roles, so that each node has one role only, the migration eventually happens and ends successfully.

Version-Release number of selected component (if applicable):

Any 4.11, tested on the current latest 4.11.43.

How reproducible:

Always

Steps to Reproduce:

1. Install a cluster in 4.10 on a platform that uses keepalived.
2. Label one worker with "node-role.kubernetes.io/infra" label, so it is both infra and worker (it has both "node-role.kubernetes.io/infra" and "node-role.kubernetes.io/worker" labels).
3. Create an infra MCP that adopts nodes with the "node-role.kubernetes.io/infra" label
4. Update to 4.11

Actual results:

After the upgrade is completed, unicast migration never happens because keepalived-monitor pods believe that the upgrade hasn't finished. We have to remove roles from nodes so that each node has only one role so keepalived-monitor detects the end of the upgrade.

Expected results:

Unicast migration to happen because the upgrade finished even if there are nodes with more than one role.

Additional info:

A source code analysis will follow in a bug comment, of why I believe this happens.

duplicates

OCPBUGS-14403 IngressVIP getting attach to two nodes at once

Closed

links to

Solution (Knowledge Base)

Assignee:: Benjamin Nemec

Reporter:: Pablo Alonso Rodriguez

QA Contact:: Zhanqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/23 2:53 PM

Updated:: 2023/06/26 8:00 AM

Resolved:: 2023/06/26 7:59 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates