This is a clone of issue OCPBUGS-48340. The following is the description of the original issue:
—
Component Readiness has found a potential regression in the following test:
[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
Significant regression detected.
Fishers Exact probability of a regression: 99.95%.
Test pass rate dropped from 99.06% to 93.75%.
Sample (being evaluated) Release: 4.18
Start Time: 2025-01-06T00:00:00Z
End Time: 2025-01-13T16:00:00Z
Success Rate: 93.75%
Successes: 45
Failures: 3
Flakes: 0
Base (historical) Release: 4.17
Start Time: 2024-09-01T00:00:00Z
End Time: 2024-10-01T23:59:59Z
Success Rate: 99.06%
Successes: 210
Failures: 2
Flakes: 0
View the test details report for additional context.
From the test details link, two of the three referenced failures are as follows:
[ { "metric": { "__name__": "ALERTS", "alertname": "OperatorHubSourceError", "alertstate": "firing", "container": "catalog-operator", "endpoint": "https-metrics", "exported_namespace": "openshift-marketplace", "instance": "[fd01:0:0:1::1a]:8443", "job": "catalog-operator-metrics", "name": "community-operators", "namespace": "openshift-operator-lifecycle-manager", "pod": "catalog-operator-6c446dcbbb-sxvjz", "prometheus": "openshift-monitoring/k8s", "service": "catalog-operator-metrics", "severity": "warning" }, "value": [ 1736751753.045, "1" ] } ]
This looks to always happen sparodically in CI lately: https://search.dptools.openshift.org/?search=OperatorHubSourceError&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Though overall it looks quite rare.
What is happening to cause these alerts to fire?
At this moment, it's a regression for 4.18 and thus a release blocker. I suspect it will clear naturally, but it might be a good opportunity to look for a reason why. Could use some input from OLM on what exactly is happening in the runs such as these two:
- clones
-
OCPBUGS-48340 Component Readiness: OperatorHubSourceError when disableAllDefaultSources is true
- Verified
- is blocked by
-
OCPBUGS-48340 Component Readiness: OperatorHubSourceError when disableAllDefaultSources is true
- Verified
- links to