Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-2992

failure in alerts test case for KubeDeploymentReplicasMismatch and KubePodCrashLooping

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • None
    • None
    • SDN Sprint 219
    • 0
    • 0.0

    Description

      job link

      must-gather

      log snippet from e2e log:

      [It] shouldn't report any unexpected alerts in firing or pending state [Suite:openshift/conformance/parallel]
        github.com/openshift/origin/test/extended/prometheus/prometheus.go:270
      Apr 23 13:06:41.140: INFO: Alerts were detected during test run which are allowed:
      
      alert etcdGRPCRequestsSlow pending for 7.982000112533569 seconds with labels: {endpoint="etcd-metrics", grpc_method="Range", grpc_service="etcdserverpb.KV", instance="10.0.0.7:9979", job="etcd", namespace="openshift-etcd", pod="etcd-ci-op-gtnrs4jd-5cb9e-zjx5j-master-1", service="etcd", severity="critical"} (allowed: has a separate e2e test)
      alert etcdGRPCRequestsSlow pending for 7.982000112533569 seconds with labels: {endpoint="etcd-metrics", grpc_method="Txn", grpc_service="etcdserverpb.KV", instance="10.0.0.7:9979", job="etcd", namespace="openshift-etcd", pod="etcd-ci-op-gtnrs4jd-5cb9e-zjx5j-master-1", service="etcd", severity="critical"} (allowed: has a separate e2e test)
      Apr 23 13:06:41.140: FAIL: Unexpected alerts fired or pending after the test run:
      
      alert KubeDeploymentReplicasMismatch fired for 1927 seconds with labels: {container="kube-rbac-proxy-main", deployment="downloads", endpoint="https-main", job="kube-state-metrics", namespace="openshift-console", service="kube-state-metrics", severity="warning"}
      alert KubePodCrashLooping fired for 1899 seconds with labels: {container="download-server", endpoint="https-main", job="kube-state-metrics", namespace="openshift-console", pod="downloads-5644b5d4ff-4rw5g", reason="CrashLoopBackOff", service="kube-state-metrics", severity="warning", uid="0ef8038f-255a-4708-8a06-c89989a1ef45"}
      
      Full Stack Trace
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0000001a0)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xba
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc003502ea0)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x125
      github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0x7fb7e69cffff)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/it_node.go:26 +0x7b
      github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc002e8a2d0, 0xc003503268, {0x8c6fc00, 0xc0005bcc00})
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/spec/spec.go:215 +0x2a9
      github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc002e8a2d0, {0x8c6fc00, 0xc0005bcc00})
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/spec/spec.go:138 +0xe7
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc000b6c640, 0xc002e8a2d0)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0xe5
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc000b6c640)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x1a5
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc000b6c640)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0xc5
      github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0005d0730, {0x8c6ff20, 0xc002ab3590}, {0x0, 0x25f3c9b}, {0xc002a86db0, 0x1, 0x1}, {0x8d851f8, 0xc0005bcc00}, ...)
      	github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/suite/suite.go:62 +0x4b2
      github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001c4dc80, {0xc001b4d960, 0xc640430, 0x484ebe0})
      	github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x3be
      main.newRunTestCommand.func1.1()
      	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:434 +0x32
      github.com/openshift/origin/test/extended/util.WithCleanup(0xc001c3fc18)
      	github.com/openshift/origin/test/extended/util/test.go:168 +0xad
      main.newRunTestCommand.func1(0xc0003fd180, {0xc001b4d960, 0x1, 0x1})
      	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:434 +0x349
      github.com/spf13/cobra.(*Command).execute(0xc0003fd180, {0xc001b4d930, 0x1, 0x1})
      	github.com/spf13/cobra@v1.2.1/command.go:856 +0x60e
      github.com/spf13/cobra.(*Command).ExecuteC(0xc00017e780)
      	github.com/spf13/cobra@v1.2.1/command.go:974 +0x3bc
      github.com/spf13/cobra.(*Command).Execute(...)
      	github.com/spf13/cobra@v1.2.1/command.go:902
      main.main.func1(0xc000954800)
      	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0x8a
      main.main()
      	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x3b6
      [AfterEach] [sig-instrumentation][Late] Alerts
        github.com/openshift/origin/test/extended/util/client.go:151
      [AfterEach] [sig-instrumentation][Late] Alerts
        github.com/openshift/origin/test/extended/util/client.go:152
      fail [github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Apr 23 13:06:41.140: Unexpected alerts fired or pending after the test run:
      
      alert KubeDeploymentReplicasMismatch fired for 1927 seconds with labels: {container="kube-rbac-proxy-main", deployment="downloads", endpoint="https-main", job="kube-state-metrics", namespace="openshift-console", service="kube-state-metrics", severity="warning"}
      alert KubePodCrashLooping fired for 1899 seconds with labels: {container="download-server", endpoint="https-main", job="kube-state-metrics", namespace="openshift-console", pod="downloads-5644b5d4ff-4rw5g", reason="CrashLoopBackOff", service="kube-state-metrics", severity="warning", uid="0ef8038f-255a-4708-8a06-c89989a1ef45"}
      
      failed: (1.9s) 2022-04-23T13:06:41 "[sig-instrumentation][Late] Alerts shouldn't report any unexpected alerts in firing or pending state [Suite:openshift/conformance/parallel]"
      

      This is happening across different namespaces and pods so assuming it all has the same root cause. Adding job links to these in comments as I come across them.

      link to this job's testgrid for reference.

       

      Attachments

        Activity

          People

            pepalani@redhat.com Periyasamy Palanichamy
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: