Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-29383

[2212198] both virt-controllers are crashing due to panic

XMLWordPrintable

    • Urgent
    • None

      I'm running a scale regression setup on :
      =========================================
      OCP 4.12.3
      OpenShift Virtualization 4.12.3

      I'm running a large-scale setup with 130 nodes running 6000 VMs using an external RHCS as storage.

      during mass VM migration testing in which I initiated 2000 VMs migration, both virt-controllers started crashing in a loop due to panic, I found myself in a situation in which I was unable to initiate any actions, and currently unable to recover.
      ================================================================================
      virt-controller-7887c7c647-8v4t4 0/1 CrashLoopBackOff 40 (3m57s ago) 10d
      virt-controller-7887c7c647-pnjpq 0/1 CrashLoopBackOff 40 (2m59s ago) 10d
      ================================================================================

      {"component":"virt-controller","level":"info","msg":"Starting disruption budget controller.","pos":"disruptionbudget.go:316","timestamp":"2023-06-04T16:14:40.853832Z"} {"component":"virt-controller","level":"info","msg":"Starting snapshot controller.","pos":"snapshot_base.go:199","timestamp":"2023-06-04T16:14:40.853820Z"} {"component":"virt-controller","level":"info","msg":"Starting clone controller","pos":"clone_base.go:149","timestamp":"2023-06-04T16:14:40.853885Z"} {"component":"virt-controller","level":"info","msg":"Starting vmi controller.","pos":"vmi.go:229","timestamp":"2023-06-04T16:14:40.853842Z"} {"component":"virt-controller","level":"info","msg":"Starting export controller.","pos":"export.go:290","timestamp":"2023-06-04T16:14:40.854063Z"} {"component":"virt-controller","level":"info","msg":"TSC Freqency node update status: 0 updated, 129 skipped, 0 errors","pos":"nodetopologyupdater.go:44","timestamp":"2023-06-04T16:14:41.166980Z"} {"component":"virt-controller","level":"info","msg":"certificate with common name 'virt-controller.openshift-cnv.pod.cluster.local' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537128Z"} {"component":"virt-controller","level":"info","msg":"certificate with common name 'virt-controller.openshift-cnv.pod.cluster.local' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537270Z"} {"component":"virt-controller","level":"info","msg":"certificate with common name 'export.kubevirt.io@1685870363' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537273Z"} {"component":"virt-controller","level":"info","msg":"certificate with common name 'export.kubevirt.io@1685870363' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537395Z"}

      E0604 16:14:43.755257 1 runtime.go:78] Observed a panic: runtime.boundsError

      {x:-2, y:0, signed:true, code:0x2}

      (runtime error: slice bounds out of range [:-2])
      goroutine 1279 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic(

      {0x1bcac20?, 0xc02b374e10}

      )
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x86
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash(

      {0x0, 0x0, 0xc00096e260?})
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
      panic({0x1bcac20, 0xc02b374e10})
      /usr/lib/golang/src/runtime/panic.go:884 +0x212
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).sync(0xc003787880, 0xc003ead4b0, {0xc02b6635e0?, 0x4, 0x4}, {0xc00234e200?, 0x16, 0x20})
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:415 +0x997
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).execute(0xc003787880, {0xc003d12bb0, 0x9})
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:335 +0x176
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Execute(0xc003787880)
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:296 +0x108
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).runWorker(0xc003333ea0?)
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:286 +0x25
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e69c0?, {0x212aa80, 0xc02b981da0}, 0x1, 0xc003cf8b40)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
      k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x1f776b8?, 0xc003333f88?)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
      created by kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Run
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:278 +0x275
      panic: runtime error: slice bounds out of range [:-2] [recovered]
      panic: runtime error: slice bounds out of range [:-2]

      goroutine 1279 [running]:
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00096e260?}

      )
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd7
      panic(

      {0x1bcac20, 0xc02b374e10}

      )
      /usr/lib/golang/src/runtime/panic.go:884 +0x212
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).sync(0xc003787880, 0xc003ead4b0,

      {0xc02b6635e0?, 0x4, 0x4}

      ,

      {0xc00234e200?, 0x16, 0x20}

      )
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:415 +0x997
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).execute(0xc003787880,

      {0xc003d12bb0, 0x9}

      )
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:335 +0x176
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Execute(0xc003787880)
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:296 +0x108
      kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).runWorker(0xc003333ea0?)
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:286 +0x25
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e69c0?,

      {0x212aa80, 0xc02b981da0}

      , 0x1, 0xc003cf8b40)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
      k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x1f776b8?, 0xc003333f88?)
      /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
      created by kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Run
      /remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:278 +0x275
      ================================================================================
      logs:

      http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/virt_controller_panic_during_migration.gz
      ================================================================================

              sgott@redhat.com Stuart Gott
              bbenshab Boaz Ben Shabat
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: