Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57031

Component Readiness: insights-runtime-extractor KubeDaemonSetRolloutStuck alert firing on ROSA

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

      Significant regression detected.
      Fishers Exact probability of a regression: 99.95%.
      Test pass rate dropped from 95.12% to 80.95%.

      Sample (being evaluated) Release: 4.19
      Start Time: 2025-05-27T00:00:00Z
      End Time: 2025-06-03T16:00:00Z
      Success Rate: 80.95%
      Successes: 17
      Failures: 4
      Flakes: 0

      Base (historical) Release: 4.18
      Start Time: 2025-01-26T00:00:00Z
      End Time: 2025-02-25T23:59:59Z
      Success Rate: 95.12%
      Successes: 77
      Failures: 4
      Flakes: 1

      View the test details report for additional context.

      Of these four failures, all seem to include:

            {
              "metric": {
                "__name__": "ALERTS",
                "alertname": "KubeDaemonSetRolloutStuck",
                "alertstate": "firing",
                "container": "kube-rbac-proxy-main",
                "daemonset": "insights-runtime-extractor",
                "endpoint": "https-main",
                "job": "kube-state-metrics",
                "namespace": "openshift-insights",
                "prometheus": "openshift-monitoring/k8s",
                "service": "kube-state-metrics",
                "severity": "warning"
              },
              "value": [
                1748649515.739,
                "1"
              ]
            }
      

      One other run includes an additional alert, but lets not worry about that here.

      Sample job run: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1928175377595764736

              aos-workloads-staff Workloads Team Bot Account
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: