-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.11
-
None
-
Important
-
No
-
Proposed
-
False
-
Description of problem:
Let's say we start with a 4.10 a cluster where some nodes have both infra and worker roles (or master and worker roles) while others only have the worker roles. The infra nodes are managed by their own infra mcp, which inherits the default worker machineconfigs (in the supported way). In such a cluster, if we upgrade to 4.11, upgrade completes but the keepalived-monitor never performs the migration, because it mistakenly believes that the migration has never ended. We can confirm with error messages like the following: time="2023-06-23T14:40:00Z" level=info msg="Failed to retrieve upgrade status or Upgrade still running" err="<nil>" upgradeRunning=true If we temporarily remove one of the roles, so that each node has one role only, the migration eventually happens and ends successfully.
Version-Release number of selected component (if applicable):
Any 4.11, tested on the current latest 4.11.43.
How reproducible:
Always
Steps to Reproduce:
1. Install a cluster in 4.10 on a platform that uses keepalived. 2. Label one worker with "node-role.kubernetes.io/infra" label, so it is both infra and worker (it has both "node-role.kubernetes.io/infra" and "node-role.kubernetes.io/worker" labels). 3. Create an infra MCP that adopts nodes with the "node-role.kubernetes.io/infra" label 4. Update to 4.11
Actual results:
After the upgrade is completed, unicast migration never happens because keepalived-monitor pods believe that the upgrade hasn't finished. We have to remove roles from nodes so that each node has only one role so keepalived-monitor detects the end of the upgrade.
Expected results:
Unicast migration to happen because the upgrade finished even if there are nodes with more than one role.
Additional info:
A source code analysis will follow in a bug comment, of why I believe this happens.
- duplicates
-
OCPBUGS-14403 IngressVIP getting attach to two nodes at once
- Closed
- links to