Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77505

Node stop announcing its address if the node is cordoned when using MetalLB Operator in v4.18

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      IHAC who reported an issue with Metallb in OpenShift v4.18. In the past, the same issue has been reported in the upstream version 0.14.9, which eventually was resolved by upgrading to verison 0.15.2. But now in the metallb-operator version metallb-operator.v4.18.0-202601201947, we seem to be seeing the same issue.

      • The change in PR 2470 [1] causes a node to stop announcing its address if the node is cordoned. For us this is a landmine waiting to catch us by surprise and cause an outage.
      • That change was reverted in PR 2715 [2] in version 0.15.0

      [1] https://github.com/metallb/metallb/pull/2470
      [2] https://github.com/metallb/metallb/pull/2715

      The customer also confirmed that they do have an OpenShift v4.19 cluster in which we have metallb-operator.v4.19.0-202601120612, and we don't see the misbehavior there. We can cordon a node, and it continues to announce addresses.

      Version-Release number of selected component (if applicable):

          metallb-operator version metallb-operator.v4.18.0-202601201947

      How reproducible:

          Always

      Steps to Reproduce:

          1. Configure Metallb to announce an IP address in L2 mode
          2. Cordon the node which is announcing the IP address currently
          3. Observe if the node still continues to announce the address
          

      Actual results:

          Node stop announcing its address if the node is cordoned.

      Expected results:

          After we cordon a node, it should continue to announce addresses

      Additional info:

      I also opened a Slack thread[3] with engineering, and as discussed, I am reporting this bug, so that engineering can work on the backport.

      [3] https://redhat-internal.slack.com/archives/C01EH16NFPZ/p1771928354721749

              fpaoline@redhat.com Federico Paolinelli
              rhn-support-mmarkand Mridul Markandey
              Arti Sood Arti Sood
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: