Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: 4.21.0
Affects Version/s: 4.21.0
Component/s: Monitoring
Labels:
- disruption

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.21.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

We have been tracking disruption in 4.21-e2e-agent-ha-dualstack-conformance jobs and have traced it back to MON-4290: add test for must-gather gather_metrics.

Most concerning are the disruption - monitoring failures - node reboots - high cpu alerts noted in the intervals

Observing the cpu usage during the test show high usage as well as outages.

topk(25,
  sum by (namespace) (
    rate(container_cpu_usage_seconds_total{container!="",pod!="",namespace=~".*must-gather.*|.*monitoring.*"}[5m])
  )
)

This test also regularly flakes

We want to revert the test while it is reworked to evaluate the impact on other jobs / tests and either address the CPU issues or limit the impact to not cause disruption / test failures. It doesn't appear to be a clean revert so either skipping the test entirely or just for the most impacted metal jobs is an alternative but we would like something done to address the disruption / failures quickly while longer term fixes are considered.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2025-10-15-10-04-09-796.png
432 kB
2025/10/15 2:04 PM
image-2025-10-15-10-05-14-166.png
20 kB
2025/10/15 2:05 PM

links to

openshift/origin#30386: OCPBUGS-63149: revert https://github.com/openshift/origin/pull/30054

Assignee:: Ayoub Mrini

Reporter:: Forrest Babcock

QA Contact:: Junqi Zhao

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/10/15 2:07 PM

Updated:: 2026/02/10 9:56 AM

Resolved:: 2026/02/10 9:56 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates