Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60160

Node not ready failures on azure due to networking issue

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.18.0
    • kube-apiserver
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      [sig-node] node-lifecycle detects unexpected not ready node
      [sig-node] node-lifecycle detects unreachable state on node
      
      Two above tests fail on azure upgrade job around 20% on avg due to networking issue. According to initial investigation, established TCP connections between Kubelets and the Azure balancer VIP are black-holing. This issue may be related to hairpin NAT issue with Azure LB even though it was supposed to be addressed via the custom routes script on master in machine-config-operator.

      Version-Release number of selected component (if applicable):

          4.18

      How reproducible:

          Run rollout or upgrade job for Azure on OCP 4.18

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

      Here is the doc on the networking issue: https://docs.google.com/document/d/1rYjqiEs3hlOZwL_L5oKy_WcLvAzZoi_pu3023QqU4CE/

              Unassigned Unassigned
              vdinh@redhat.com Vu Dinh
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: