Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31467

az.EnsureHostInPool panic when Azure VM instance not found

    XMLWordPrintable

Details

    • No
    • CLOUD Sprint 251, CLOUD Sprint 252, CLOUD Sprint 253
    • 3
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

          on Azure, when kube-controller-manager verify whether a machine exists or not, if the machine was already deleted, the code may panic with sigsegv
      
      I0320 12:02:55.806321       1 azure_backoff.go:91] GetVirtualMachineWithRetry(worker-e32ads-westeurope2-f72dr): backoff success
      I0320 12:02:56.028287       1 azure_wrap.go:201] Virtual machine "worker-e16as-westeurope1-hpz2t" is under deleting
      I0320 12:02:56.028328       1 azure_standard.go:752] GetPrimaryInterface(worker-e16as-westeurope1-hpz2t, ) abort backoff
      E0320 12:02:56.028334       1 azure_standard.go:825] error: az.EnsureHostInPool(worker-e16as-westeurope1-hpz2t), az.VMSet.GetPrimaryInterface.Get(worker-e16as-westeurope1-hpz2t, ), err=instance not found
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x33d21f6]goroutine 240642 [running]:
      k8s.io/legacy-cloud-providers/azure.(*availabilitySet).EnsureHostInPool(0xc000016580, 0xc0262fb400, {0xc02d8a5080, 0x32}, {0xc021c1bc70, 0xc4}, {0x0, 0x0}, 0xa8?)
              vendor/k8s.io/legacy-cloud-providers/azure/azure_standard.go:831 +0x4f6
      k8s.io/legacy-cloud-providers/azure.(*availabilitySet).EnsureHostsInPool.func2()
              vendor/k8s.io/legacy-cloud-providers/azure/azure_standard.go:928 +0x5f
      k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func1(0xc0159d0788?)
      

      Version-Release number of selected component (if applicable):

          4.12.48
      

      (ships https://github.com/openshift/kubernetes/commit/6df21776c7879727ab53895df8a03e53fb725d74)
      issue introduced by https://github.com/kubernetes/kubernetes/pull/111428/files#diff-0414c3aba906b2c0cdb2f09da32bd45c6bf1df71cbb2fc55950743c99a4a5fe4

      How reproducible:

          was unable to reproduce, happens occasionally
      

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          panic

      Expected results:

          no panic

      Additional info:

          internal case 03772590

      Attachments

        Activity

          People

            rh-ee-nbrubake Nolan Brubaker
            frigault Francois Rigault
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: