Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15227

CI got "Undiagnosed panic detected in pod" in ovnkube-master because of TypeAssertionError: "interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.14
    • Moderate
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem

      CI failed with a "Undiagnosed panic detected in pod" in ovnkube-master:

      {  pods/openshift-ovn-kubernetes_ovnkube-master-tqzxx_ovnkube-master_previous.log.gz:E0620 21:33:48.429234       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x1dd8820), concrete:(*runtime._type)(0x1f01740), asserted:(*runtime._type)(0x20f42e0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod)}
      

      This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/951/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn/1671257883095863296. Search.ci has other similar failures.

      Version-Release number of selected component (if applicable)

      I have seen this in 4.14 CI jobs.

      How reproducible

      Presently, search.ci shows the following stats for the past two days:

      Found in 0.06% of runs (0.36% of failures) across 41379 total runs and 5056 jobs (16.87% failed)
      periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-aws-ovn-arm64-techpreview-serial (all) - 12 runs, 42% failed, 20% of failures match = 8% impact
      periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-serial-aws-arm64 (all) - 12 runs, 33% failed, 25% of failures match = 8% impact
      pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn (all) - 11 runs, 73% failed, 13% of failures match = 9% impact
      pull-ci-openshift-aws-ebs-csi-driver-operator-master-e2e-aws-csi-extended (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
      pull-ci-openshift-origin-master-e2e-aws-ovn-serial (all) - 10 runs, 50% failed, 20% of failures match = 10% impact
      pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn (all) - 10 runs, 30% failed, 67% of failures match = 20% impact
      pull-ci-openshift-oc-master-e2e-aws-ovn-serial (all) - 16 runs, 31% failed, 20% of failures match = 6% impact
      pull-ci-openshift-kubernetes-master-e2e-aws-ovn-serial (all) - 8 runs, 50% failed, 25% of failures match = 13% impact
      pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade-local-gateway (all) - 10 runs, 100% failed, 10% of failures match = 10% impact
      periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-aws-ovn-arm64 (all) - 12 runs, 17% failed, 50% of failures match = 8% impact
      pull-ci-openshift-installer-master-e2e-aws-ovn (all) - 33 runs, 6% failed, 50% of failures match = 3% impact
      pull-ci-openshift-cluster-network-operator-master-e2e-ovn-step-registry (all) - 4 runs, 25% failed, 200% of failures match = 50% impact
      pull-ci-openshift-cluster-network-operator-master-e2e-vsphere-ovn (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
      pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
      pull-ci-openshift-cluster-capi-operator-main-e2e-aws-capi-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
      pull-ci-openshift-cluster-ingress-operator-master-e2e-gcp-ovn (all) - 9 runs, 33% failed, 33% of failures match = 11% impact
      pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-single-node (all) - 10 runs, 30% failed, 33% of failures match = 10% impact
      pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-windows (all) - 11 runs, 45% failed, 20% of failures match = 9% impact
      pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
      pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-operator (all) - 10 runs, 10% failed, 100% of failures match = 10% impact
      pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws-ovn-single-node (all) - 15 runs, 27% failed, 25% of failures match = 7% impact
      

      Steps to Reproduce

      1. Check search.ci: https://search.ci.openshift.org/?search=interface+%5C%7B%5C%7D+is+cache%5C.DeletedFinalStateUnknown%2C+not+%5C*v1%5C.Pod&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      Actual results

      CI fails.

      Expected results

      CI passes, or fails on some other test failure.

      Additional info

      The failing jobs include various platforms (AWS, GCP, vSphere, Nutanix, Windows) and appear to have started on 2023-06-18; these are the oldest occurrences of the failure that I found in search.ci:

              jgil@redhat.com Jordi Gil
              mmasters1@redhat.com Miciah Masters
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: