Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48340

Component Readiness: OperatorHubSourceError when disableAllDefaultSources is true

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18.0
    • OLM
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Component Readiness has found a potential regression in the following test:

      [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

      Significant regression detected.
      Fishers Exact probability of a regression: 99.95%.
      Test pass rate dropped from 99.06% to 93.75%.

      Sample (being evaluated) Release: 4.18
      Start Time: 2025-01-06T00:00:00Z
      End Time: 2025-01-13T16:00:00Z
      Success Rate: 93.75%
      Successes: 45
      Failures: 3
      Flakes: 0

      Base (historical) Release: 4.17
      Start Time: 2024-09-01T00:00:00Z
      End Time: 2024-10-01T23:59:59Z
      Success Rate: 99.06%
      Successes: 210
      Failures: 2
      Flakes: 0

      View the test details report for additional context.

      From the test details link, two of the three referenced failures are as follows:

          [
            {
              "metric": {
                "__name__": "ALERTS",
                "alertname": "OperatorHubSourceError",
                "alertstate": "firing",
                "container": "catalog-operator",
                "endpoint": "https-metrics",
                "exported_namespace": "openshift-marketplace",
                "instance": "[fd01:0:0:1::1a]:8443",
                "job": "catalog-operator-metrics",
                "name": "community-operators",
                "namespace": "openshift-operator-lifecycle-manager",
                "pod": "catalog-operator-6c446dcbbb-sxvjz",
                "prometheus": "openshift-monitoring/k8s",
                "service": "catalog-operator-metrics",
                "severity": "warning"
              },
              "value": [
                1736751753.045,
                "1"
              ]
            }
          ]
      

      This looks to always happen sparodically in CI lately: https://search.dptools.openshift.org/?search=OperatorHubSourceError&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      Though overall it looks quite rare.

      What is happening to cause these alerts to fire?

      At this moment, it's a regression for 4.18 and thus a release blocker. I suspect it will clear naturally, but it might be a good opportunity to look for a reason why. Could use some input from OLM on what exactly is happening in the runs such as these two:

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6/1878675368131432448

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6/1877545344619778048

              rhn-support-jiazha Jian Zhang
              rhn-engineering-dgoodwin Devan Goodwin
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: