Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7749

Change to Unicast happens causing outage to ingress service during upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.11.z, 4.10.z
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      During the 4.10 to 4.11 upgrade. The ingress VIP was active on more than one node at the same time causing the upgrade failure. After some digging we found out that, some nodes were configured to use Unicast for Keepalived and some were not, resulting in effectively a split-brain situation where there were 2 keepalived masters for the ingress VIP. keepalived shouldn’t switch to unicast until after the cluster upgrade is complete, but what we found was that was a period of around 2 hours during the upgrade that keepalived was in a split-brain scenario. Not all nodes were upgraded to 4.11.25 before the switch to unicast was made We have found a similar bug [1] but that bug says keepalived never changes to unicast, whereas this new issue is the change to unicast happens, but doesn’t happen to all keepalived pods at the same time so we get some running in unicast and some running in a multicast for up to 2 hours. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2053309 
      I am wondering if it is the same issue. 

              bnemec@redhat.com Benjamin Nemec
              rh-ee-adpawar Aditya Pawar
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: