Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-51090

ASH azure-cloud-node-manager pod stuck in CrashLoopBackOff state

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • Yes
    • None
    • Approved
    • CLOUD Sprint 267, CLOUD Sprint 268, CLOUD Sprint 269
    • 3
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Azure ASH cluster install failed. azure-cloud-node-manager pod stuck in CrashLoopBackOff state 
      
       oc get pod -n openshift-cloud-controller-manager
      NAME                                              READY   STATUS             RESTARTS         AGE
      azure-cloud-controller-manager-77646f8854-87b89   1/1     Running            0                69m
      azure-cloud-controller-manager-77646f8854-h8wrs   1/1     Running            0                69m
      azure-cloud-node-manager-72bbx                    0/1     CrashLoopBackOff   18 (2m44s ago)   69m
      azure-cloud-node-manager-sskdq                    0/1     CrashLoopBackOff   18 (2m ago)      69m
      azure-cloud-node-manager-tm84r                    0/1     CrashLoopBackOff   18 (2m6s ago)    69m
      
      I0220 02:56:09.694591       1 nodemanager.go:512] Adding node label from cloud provider: node.kubernetes.io/instance-type=Standard_DS4_v2
      E0220 02:56:09.694707       1 panic.go:262] "Observed a panic" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=<
      	goroutine 233 [running]:
      	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x32b2158, 0x4d1c000}, {0x2877c40, 0x4c16360})
      		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc
      	k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x32b2158, 0x4d1c000}, {0x2877c40, 0x4c16360}, {0x4d1c000, 0x0, 0x43cb25?})
      		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:82 +0x5e
      	k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000604380?})
      		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108
      	panic({0x2877c40?, 0x4c16360?})
      		/usr/lib/golang/src/runtime/panic.go:785 +0x132
      	sigs.k8s.io/cloud-provider-azure/pkg/provider.(*availabilitySet).GetZoneByNodeName(0xc00017e018, {0x32b20b0?, 0x4d1c000?}, {0xc00005acd8?, 0x2?})
      		/go/src/github.com/openshift/cloud-provider-azure/pkg/provider/azure_standard.go:583 +0x71
      	sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).GetZone(0xc000602308, {0x32b20b0, 0x4d1c000})

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Always

      Steps to Reproduce:

          1. Install cluster on azure ash
          2.
          3.
          

      Actual results:

      Cluster install failed    

      Expected results:

      Cluster install succeed       

      Additional info:

      failed job https://jenkins-csb-openshift-qe-mastern.dno.corp.redhat.com/job/ocp-common/job/Flexy-install/335805/console 

      panic is here https://github.com/openshift/cloud-provider-azure/blob/main/pkg/provider/azure_standard.go#L583 

      seems rebase pr introduced this issue.

      upstream pr https://github.com/kubernetes-sigs/cloud-provider-azure/pull/7833 

              rmanak@redhat.com Radek Manak
              rhn-support-zhsun Zhaohua Sun
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: