Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62273

[release-4.20] egressIP GARP sent by incorrect node after reboot

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • CORENET Sprint 277, CORENET Sprint 278
    • 2
    • In Progress
    • Bug Fix
    • Hide
      Before this update, if the `OVNKube-controller` on a node failed to process updates and configure its local OVN database, the `OVN-controller` could connect to this stale database. This caused the `OVN-controller` to consume outdated EgressIP configurations and send incorrect Gratuitous ARPs (GARPs) for an IP address that might have already moved to a different node. With this release, the `OVN-controller` is blocked from sending these GARPs during the time when the `OVNKube-controller` is not processing updates. As a result, network disruptions are prevented by ensuring GARPs are not sent based on stale database information. (link:https://issues.redhat.com/browse/OCPBUGS-62273[OCPBUGS-62273])
      Show
      Before this update, if the `OVNKube-controller` on a node failed to process updates and configure its local OVN database, the `OVN-controller` could connect to this stale database. This caused the `OVN-controller` to consume outdated EgressIP configurations and send incorrect Gratuitous ARPs (GARPs) for an IP address that might have already moved to a different node. With this release, the `OVN-controller` is blocked from sending these GARPs during the time when the `OVNKube-controller` is not processing updates. As a result, network disruptions are prevented by ensuring GARPs are not sent based on stale database information. (link: https://issues.redhat.com/browse/OCPBUGS-62273 [ OCPBUGS-62273 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-42303. The following is the description of the original issue:

      Description of problem:
      After rebooting an egressIP assignee node (let's say node A) the egressIP is correctly moved to a different node if available (node B), but when the ovnkube-node starts on node A it sends a couple of Gratuitos ARP requests for that egressIP no more assigned to it. 

      Version-Release number of selected component (if applicable):

      4.15 (also reproduced in 4.16.11)

      How reproducible:

      100% Always

      Steps to Reproduce:

      1. Configure egressIP and 2 assignable nodes 

      2. Reboot the assignee node. 

      3. Verify the egressIP moved to another node, and when ovnkube-node starts on the previous assignee it sends a couple of GARP request for the egressIP assigned to another node.

      Actual results:

      The previous egressIP assignee node sends GARPs for an egressIP not assigned to it causing the neighbors in the network to set the wrong MAC address in the ARP table for the egressIP.

      Expected results:

      The previous egressIP assignee node should not send GARPs for an egressIP not assigned to it.

      Additional info:
      This behavior is particularly problematic when in the underlying infrastructure is enabled a mechanisms for caching ARP.

              mkennell@redhat.com Martin Kennelly
              rhn-support-cpassare Christian Passarelli
              None
              None
              Jean Chen Jean Chen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: