Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11299

[Azure][CCM] Replace master failed as new master did not add into lb backend

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 4.14
    • None
    • Important
    • None
    • CLOUD Sprint 234, CLOUD Sprint 235
    • 2
    • Approved
    • False

      This bug is a clone of OCPBUGS-11143 for out-of-tree Azure CCM provider.

      This is the same regression we had in in-tree legacy azure provider, but for out-of-tree Azure CCM provider, which we are migrating to to in 4.14.

      Description of problem:

      On azure, delete a master, old machine stuck in Deleting, some pods in cluster are in ImagePullBackOff, check from azure console, new master did not add into lb backend, seems this lead the machine has no internet connection.

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      Always

      Steps to Reproduce:

      1. Set up a cluster on Azure, networkType ovn
      2. Delete a master
      3. Check master and pod
      

      Actual results:

      Old machine stuck in Deleting,  some pods are in ImagePullBackOff.
       $ oc get machine    
      NAME                                    PHASE      TYPE              REGION   ZONE   AGE
      zhsunaz2132-5ctmh-master-0              Deleting   Standard_D8s_v3   westus          160m
      zhsunaz2132-5ctmh-master-1              Running    Standard_D8s_v3   westus          160m
      zhsunaz2132-5ctmh-master-2              Running    Standard_D8s_v3   westus          160m
      zhsunaz2132-5ctmh-master-flqqr-0        Running    Standard_D8s_v3   westus          105m
      zhsunaz2132-5ctmh-worker-westus-dhwfz   Running    Standard_D4s_v3   westus          152m
      zhsunaz2132-5ctmh-worker-westus-dw895   Running    Standard_D4s_v3   westus          152m
      zhsunaz2132-5ctmh-worker-westus-xlsgm   Running    Standard_D4s_v3   westus          152m
      
      $ oc describe machine zhsunaz2132-5ctmh-master-flqqr-0  -n openshift-machine-api |grep -i "Load Balancer"
            Internal Load Balancer:  zhsunaz2132-5ctmh-internal
            Public Load Balancer:      zhsunaz2132-5ctmh
      
      $ oc get node            
      NAME                                    STATUS     ROLES                  AGE    VERSION
      zhsunaz2132-5ctmh-master-0              Ready      control-plane,master   165m   v1.26.0+149fe52
      zhsunaz2132-5ctmh-master-1              Ready      control-plane,master   165m   v1.26.0+149fe52
      zhsunaz2132-5ctmh-master-2              Ready      control-plane,master   165m   v1.26.0+149fe52
      zhsunaz2132-5ctmh-master-flqqr-0        NotReady   control-plane,master   109m   v1.26.0+149fe52
      zhsunaz2132-5ctmh-worker-westus-dhwfz   Ready      worker                 152m   v1.26.0+149fe52
      zhsunaz2132-5ctmh-worker-westus-dw895   Ready      worker                 152m   v1.26.0+149fe52
      zhsunaz2132-5ctmh-worker-westus-xlsgm   Ready      worker                 152m   v1.26.0+149fe52
      $ oc describe node zhsunaz2132-5ctmh-master-flqqr-0
        Warning  ErrorReconcilingNode       3m5s (x181 over 108m)  controlplane         [k8s.ovn.org/node-chassis-id annotation not found for node zhsunaz2132-5ctmh-master-flqqr-0, macAddress annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0" , k8s.ovn.org/l3-gateway-config annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0"]
      
      $ oc get po --all-namespaces | grep ImagePullBackOf   
      openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-l8ng4                                  0/3     Init:ImagePullBackOff   0              113m
      openshift-cluster-csi-drivers                      azure-file-csi-driver-node-99k82                                  0/3     Init:ImagePullBackOff   0              113m
      openshift-cluster-node-tuning-operator             tuned-bvvh7                                                       0/1     ImagePullBackOff        0              113m
      openshift-dns                                      node-resolver-2p4zq                                               0/1     ImagePullBackOff        0              113m
      openshift-image-registry                           node-ca-vxv87                                                     0/1     ImagePullBackOff        0              113m
      openshift-machine-config-operator                  machine-config-daemon-crt5w                                       1/2     ImagePullBackOff        0              113m
      openshift-monitoring                               node-exporter-mmjsm                                               0/2     Init:ImagePullBackOff   0              113m
      openshift-multus                                   multus-4cg87                                                      0/1     ImagePullBackOff        0              113m
      openshift-multus                                   multus-additional-cni-plugins-mc6vx                               0/1     Init:ImagePullBackOff   0              113m
      openshift-ovn-kubernetes                           ovnkube-master-qjjsv                                              0/6     ImagePullBackOff        0              113m
      openshift-ovn-kubernetes                           ovnkube-node-k8w6j                                                0/6     ImagePullBackOff        0              113m

      Expected results:

      Replace master successful

              ddonati@redhat.com Damiano Donati
              rhn-support-zhsun Zhaohua Sun
              Zhaohua Sun Zhaohua Sun
              Riccardo Ravaioli
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: