Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2196

Symptom Detection.Undiagnosed panic detected in pod

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Undefined
    • None
    • 4.10.z
    • Monitoring

    Description

      This bug is a backport clone of [Bugzilla Bug 2075091](https://bugzilla.redhat.com/show_bug.cgi?id=2075091). The following is the description of the original bug:

      Symptom Detection.Undiagnosed panic detected in pod

      is failing frequently in CI, see:
      https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=Symptom%20Detection.Undiagnosed%20panic%20detected%20in%20pod

      This problem seemed existing before. But number of cases surged and caused two nightly payloads to be rejected:

      https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.nightly/release/4.11.0-0.nightly-2022-04-12-150057
      https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.nightly/release/4.11.0-0.nightly-2022-04-12-185124

      After that, it mysteriously disappeared.

      Here is a specific case:

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1513895844351315968

      Message from the test case:

      { pods/openshift-monitoring_kube-state-metrics-67c5b7c7c6-88vxn_kube-state-metrics_previous.log.gz:E0412 15:52:33.358619 1 runtime.go:78] Observed a panic: runtime.boundsError

      {x:4, y:4, signed:true, code:0x0}

      (runtime error: index out of range [4] with length 4)}

      Panic trace from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1513895844351315968/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/pods/openshift-monitoring_kube-state-metrics-67c5b7c7c6-88vxn_kube-state-metrics_previous.log:

      E0412 15:52:33.358619 1 runtime.go:78] Observed a panic: runtime.boundsError

      {x:4, y:4, signed:true, code:0x0}

      (runtime error: index out of range [4] with length 4)
      goroutine 77 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic(

      {0x1741840, 0xc000b635f0})
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000ac9740})
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
      panic({0x1741840, 0xc000b635f0}

      )
      /usr/lib/golang/src/runtime/panic.go:1038 +0x215
      k8s.io/kube-state-metrics/v2/internal/store.createPodContainerInfoFamilyGenerator.func1(0xc003422c00)
      /go/src/k8s.io/kube-state-metrics/internal/store/pod.go:134 +0x375
      k8s.io/kube-state-metrics/v2/internal/store.wrapPodFunc.func1(

      {0x1804880, 0xc003422c00})
      /go/src/k8s.io/kube-state-metrics/internal/store/pod.go:1386 +0x5a
      k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(...)
      /go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:67
      k8s.io/kube-state-metrics/v2/pkg/metric_generator.ComposeMetricGenFuncs.func1({0x1804880, 0xc003422c00}

      )
      /go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:107 +0xd8
      k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Add(0xc0000c13c0,

      {0x1804880, 0xc003422c00})
      /go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:72 +0xd4
      k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Update(0xc003422c00, {0x1804880, 0xc003422c00}

      )
      /go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:87 +0x25
      k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc000192fc0,

      {0x0, 0x0, 0x26cdee0}

      ,

      {0x1a373f8, 0xc0011c24c0}

      , 0xc000623d60, 0xc0005ff380, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:506 +0xa55
      k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc000192fc0, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:429 +0x696
      k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:221 +0x26
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f02ffada1d0)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00036a2c0,

      {0x1a1daa0, 0xc000386e60}

      , 0x1, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
      k8s.io/client-go/tools/cache.(*Reflector).Run(0xc000192fc0, 0xc0002cc480)
      /go/src/k8s.io/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:220 +0x1f8
      created by k8s.io/kube-state-metrics/v2/internal/store.(*Builder).startReflector
      /go/src/k8s.io/kube-state-metrics/internal/store/builder.go:508 +0x2c8
      panic: runtime error: index out of range [4] with length 4 [recovered]
      panic: runtime error: index out of range [4] with length 4

      It points to https://github.com/openshift/kube-state-metrics/blob/6efa87f858ee53028fd2de40941b61c09e9ee049/internal/store/pod.go#L134 where the len of p.Status.ContainerStatuses and p.Spec.Containers seems to diverge.

      Unfortunately the condition is ephemeral and the condition that caused the panic does not exist in the must-gather data.

      The ask is to safe guard the code to avoid the panic and log useful debugging info to track down offenders.

      Attachments

        Activity

          Public project attachment banner

            context keys: [headless, issue, helper, isAsynchronousRequest, project, action, user]
            current Project key: OCPBUGS

            People

              spasquie@redhat.com Simon Pasquier
              openshift-crt-jira-prow OpenShift Prow Bot
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: