Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43378

prometheus pods can crash in rare scenarios

XMLWordPrintable

    • Important
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      In https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-shiftstack-ci-release-4.18-e2e-openstack-ovn-etcd-scaling/1834144693181485056 I noticed the following panic:

       Undiagnosed panic detected in pod expand_less 	0s
      {  pods/openshift-monitoring_prometheus-k8s-1_prometheus_previous.log.gz:ts=2024-09-12T09:30:09.273Z caller=klog.go:124 level=error component=k8s_client_runtime func=Errorf msg="Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3180480), concrete:(*abi.Type)(0x34a31c0), asserted:(*abi.Type)(0x3a0ac40), missingMethod:\"\"} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node)\ngoroutine 13218 [running]:\nk8s.io/apimachinery/pkg/util/runtime.logPanic({0x32f1080, 0xc05be06840})\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x90\nk8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc010ef6000?})\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b\npanic({0x32f1080?, 0xc05be06840?})\n\t/usr/lib/golang/src/runtime/panic.go:770 +0x132\ngithub.com/prometheus/prometheus/discovery/kubernetes.NewEndpoints.func11({0x34a31c0?, 0xc05bf3a580?})\n\t/go/src/github.com/prometheus/prometheus/discovery/kubernetes/endpoints.go:170 +0x4e\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:253\nk8s.io/client-go/tools/cache.(*processorListener).run.func1()\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/shared_informer.go:977 +0x9f\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00fc92f70, {0x456ed60, 0xc031a6ba10}, 0x1, 0xc015a04fc0)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc011678f70, 0x3b9aca00, 0x0, 0x1, 0xc015a04fc0)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f\nk8s.io/apimachinery/pkg/util/wait.Until(...)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161\nk8s.io/client-go/tools/cache.(*processorListener).run(0xc04c607440)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52\ncreated by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 12933\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73\n"}
      

      This issue seems relatively common on openstack, these runs seem to very frequently be this failure.

      Linked test name: Undiagnosed panic detected in pod

            rh-ee-amrini Ayoub Mrini
            rhn-engineering-dgoodwin Devan Goodwin
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: