Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.20.0
Component/s: HyperShift / ROSA
Labels:
- trt-incident

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Proposed
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

For roughly three days, 4.20 nightlies are permafailing on this job:

[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

  [
      {
        "metric": {
          "__name__": "ALERTS",
          "alertname": "KubeJobFailed",
          "alertstate": "firing",
          "condition": "true",
          "container": "kube-rbac-proxy-main",
          "endpoint": "https-main",
          "job": "kube-state-metrics",
          "job_name": "sre-replace-packageserver-csv",
          "namespace": "openshift-operator-lifecycle-manager",
          "prometheus": "openshift-monitoring/k8s",
          "service": "kube-state-metrics",
          "severity": "warning"
        },
        "value": [
          1748242812.353,
          "1"
        ]
      }
    ]

Possible this could be related to a late change the OLM teams are trying to get into 4.19 to get their catalog versions updated?

Example job failure: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-rosa-sts-ovn/1926847779737440256

Once addressed, please check if there's a brittle SRE job here that could be improved to prevent in the future.

Assignee:: Amarthya Valija (Inactive)

Reporter:: Devan Goodwin

Need Info From:: None

Contributors:: None

QA Contact:: Jie Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/05/26 11:27 AM

Updated:: 2025/07/12 1:28 PM

Resolved:: 2025/05/28 1:00 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates