Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62670

[release-4.19] egressIP GARP sent by incorrect node after reboot

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • CORENET Sprint 277, CORENET Sprint 278
    • 2
    • In Progress
    • Bug Fix
    • Hide
      OVNKube-controller container and OVN-controller container are a part of the OVNKube node pod which is located on every Node along with the OVN Databases. Before 4.14, the OVN databases were centralised on the control plane nodes. For 4.14 and above, the OVN database are decentralised. If OVNKube-controller is not processing updates from the Kubernetes API server and configuring the OVN databases on each Node, then OVN-Controller, which consume said database, may connect to the database before OVNKube-controller has configured them. This can cause OVN-Controller to sync with a stale OVN database, consume SNATs that are configured to support EgressIP and proceed to GARP for the associated IP even though said IP may have moved to another Node. This fix blocks said GARPs while OVNKube-controller is not processing updates.
      Show
      OVNKube-controller container and OVN-controller container are a part of the OVNKube node pod which is located on every Node along with the OVN Databases. Before 4.14, the OVN databases were centralised on the control plane nodes. For 4.14 and above, the OVN database are decentralised. If OVNKube-controller is not processing updates from the Kubernetes API server and configuring the OVN databases on each Node, then OVN-Controller, which consume said database, may connect to the database before OVNKube-controller has configured them. This can cause OVN-Controller to sync with a stale OVN database, consume SNATs that are configured to support EgressIP and proceed to GARP for the associated IP even though said IP may have moved to another Node. This fix blocks said GARPs while OVNKube-controller is not processing updates.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-42303. The following is the description of the original issue:

      Description of problem:
      After rebooting an egressIP assignee node (let's say node A) the egressIP is correctly moved to a different node if available (node B), but when the ovnkube-node starts on node A it sends a couple of Gratuitos ARP requests for that egressIP no more assigned to it. 

      Version-Release number of selected component (if applicable):

      4.15 (also reproduced in 4.16.11)

      How reproducible:

      100% Always

      Steps to Reproduce:

      1. Configure egressIP and 2 assignable nodes 

      2. Reboot the assignee node. 

      3. Verify the egressIP moved to another node, and when ovnkube-node starts on the previous assignee it sends a couple of GARP request for the egressIP assigned to another node.

      Actual results:

      The previous egressIP assignee node sends GARPs for an egressIP not assigned to it causing the neighbors in the network to set the wrong MAC address in the ARP table for the egressIP.

      Expected results:

      The previous egressIP assignee node should not send GARPs for an egressIP not assigned to it.

      Additional info:
      This behavior is particularly problematic when in the underlying infrastructure is enabled a mechanisms for caching ARP.

              mkennell@redhat.com Martin Kennelly
              rhn-support-cpassare Christian Passarelli
              None
              None
              Jean Chen Jean Chen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: