Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2196

Symptom Detection.Undiagnosed panic detected in pod

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.10.z
    • Monitoring

      This bug is a backport clone of [Bugzilla Bug 2075091](https://bugzilla.redhat.com/show_bug.cgi?id=2075091). The following is the description of the original bug:

      Symptom Detection.Undiagnosed panic detected in pod

      is failing frequently in CI, see:
      https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=Symptom%20Detection.Undiagnosed%20panic%20detected%20in%20pod

      This problem seemed existing before. But number of cases surged and caused two nightly payloads to be rejected:

      https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.nightly/release/4.11.0-0.nightly-2022-04-12-150057
      https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.nightly/release/4.11.0-0.nightly-2022-04-12-185124

      After that, it mysteriously disappeared.

      Here is a specific case:

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1513895844351315968

      Message from the test case:

      { pods/openshift-monitoring_kube-state-metrics-67c5b7c7c6-88vxn_kube-state-metrics_previous.log.gz:E0412 15:52:33.358619 1 runtime.go:78] Observed a panic: runtime.boundsError

      {x:4, y:4, signed:true, code:0x0}

      (runtime error: index out of range [4] with length 4)}

      Panic trace from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1513895844351315968/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/pods/openshift-monitoring_kube-state-metrics-67c5b7c7c6-88vxn_kube-state-metrics_previous.log:

      E0412 15:52:33.358619 1 runtime.go:78] Observed a panic: runtime.boundsError

      {x:4, y:4, signed:true, code:0x0}

      (runtime error: index out of range [4] with length 4)
      goroutine 77 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic(

      {0x1741840, 0xc000b635f0})
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000ac9740})
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
      panic({0x1741840, 0xc000b635f0}

      )
      /usr/lib/golang/src/runtime/panic.go:1038 +0x215
      k8s.io/kube-state-metrics/v2/internal/store.createPodContainerInfoFamilyGenerator.func1(0xc003422c00)
      /go/src/k8s.io/kube-state-metrics/internal/store/pod.go:134 +0x375
      k8s.io/kube-state-metrics/v2/internal/store.wrapPodFunc.func1(

      {0x1804880, 0xc003422c00})
      /go/src/k8s.io/kube-state-metrics/internal/store/pod.go:1386 +0x5a
      k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(...)
      /go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:67
      k8s.io/kube-state-metrics/v2/pkg/metric_generator.ComposeMetricGenFuncs.func1({0x1804880, 0xc003422c00}

      )
      /go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:107 +0xd8
      k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Add(0xc0000c13c0,

      {0x1804880, 0xc003422c00})
      /go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:72 +0xd4
      k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Update(0xc003422c00, {0x1804880, 0xc003422c00}

      )
      /go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:87 +0x25
      k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc000192fc0,

      {0x0, 0x0, 0x26cdee0}

      ,

      {0x1a373f8, 0xc0011c24c0}

      , 0xc000623d60, 0xc0005ff380, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:506 +0xa55
      k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc000192fc0, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:429 +0x696
      k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:221 +0x26
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f02ffada1d0)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00036a2c0,

      {0x1a1daa0, 0xc000386e60}

      , 0x1, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
      k8s.io/client-go/tools/cache.(*Reflector).Run(0xc000192fc0, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:220 +0x1f8
      created by k8s.io/kube-state-metrics/v2/internal/store.(*Builder).startReflector
      /go/src/k8s.io/kube-state-metrics/internal/store/builder.go:508 +0x2c8
      panic: runtime error: index out of range [4] with length 4 [recovered]
      panic: runtime error: index out of range [4] with length 4

      It points to https://github.com/openshift/kube-state-metrics/blob/6efa87f858ee53028fd2de40941b61c09e9ee049/internal/store/pod.go#L134 where the len of p.Status.ContainerStatuses and p.Spec.Containers seems to diverge.

      Unfortunately the condition is ephemeral and the condition that caused the panic does not exist in the must-gather data.

      The ask is to safe guard the code to avoid the panic and log useful debugging info to track down offenders.

              spasquie@redhat.com Simon Pasquier
              openshift-crt-jira-prow OpenShift Prow Bot
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: