Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Monitoring
Labels:
- disruption
- trt-standup

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
MON Sprint 258, MON Sprint 259
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

TRT got an alert for a disruption regression for metrics-api backend. It seems that 95 percentile is getting 5s worse than 4.16 GA data. 

Here is a dashboard where you can view the trend:

https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?orgId=1&var-percentile=P95&var-platform=aws&var-backend=metrics-api-new-connections&var-upgrade_type=minor&var-master_nodes_updated=Y&var-architectures=amd64&var-topologies=ha&var-networks=ovn&var-releases=4.17

You can also scroll down to the "Last 500 Job" to see the list of jobs with their disruption count. You can click on one of them to go to prow job to investigate further. 

Here is an example job: 

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade/1810404925734129664

It seems that the disruption is always happening when one of the masters is being rebooted.

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

is triggering

OCPBUGS-39133 Kube-aggregator reaching stale apiservice endpoints

Closed

Assignee:: Ayoub Mrini

Reporter:: Ken Zhang

Need Info From:: None

Contributors:: None

QA Contact:: Junqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/07/10 11:31 AM

Updated:: 2025/07/22 11:22 AM

Resolved:: 2024/09/12 7:23 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates