-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.11.z, 4.10.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
During the 4.10 to 4.11 upgrade. The ingress VIP was active on more than one node at the same time causing the upgrade failure. After some digging we found out that, some nodes were configured to use Unicast for Keepalived and some were not, resulting in effectively a split-brain situation where there were 2 keepalived masters for the ingress VIP. keepalived shouldn’t switch to unicast until after the cluster upgrade is complete, but what we found was that was a period of around 2 hours during the upgrade that keepalived was in a split-brain scenario. Not all nodes were upgraded to 4.11.25 before the switch to unicast was made We have found a similar bug [1] but that bug says keepalived never changes to unicast, whereas this new issue is the change to unicast happens, but doesn’t happen to all keepalived pods at the same time so we get some running in unicast and some running in a multicast for up to 2 hours. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2053309
I am wondering if it is the same issue.