Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: 4.20.0
Affects Version/s: 4.19.0
Component/s: kube-apiserver
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:

4.19.0
Target Version:

4.19.z
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

It appears that all upgrade jobs now have interval charts showing three separate bars of mass disruption during kube-apiserver upgrade:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-vsphere-ovn-upgrade/1909492594849615872

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm-upgrade/1914490406016389120

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1911995592435830784

We suspect that this kind of disruption is actually expected given it's being measured from localhost as each master is updated.

We can't leave this disruption in the charts as it will be an eternal pain point for anyone examining them, it looks quite alarming.

There is precedent for shutting down disruption monitors when nodes are updating, specifically in the in-cluster disruption monitoring. I believe this was done by monitoring EndpointSlice somewhere around here https://github.com/openshift/origin/blob/main/pkg/monitortests/network/disruptionpodnetwork/monitortest.go to determine when to shut down and restart the monitor. Doing this when the apiserver is rolling out would be one option.

Another option might be to alter or omit them when intervals are being returned in the monitortest, if they overlap with a progressing interval.

Abu points out on slack: "if we want to skip the ones that overlaps with a roll out, one thing we need to keep in mind is that, the rollout interval for an apiserver takes into account [termination start ... termination end], but it should be
[termination start ... termination end (old instance) ... ready to accept request (new instance)]"

Not critical for 4.19 release.

blocks

OCPBUGS-59868 New disruption monitoring reporting 3 bars of disruption during kube-apiserver progressing

is cloned by

OCPBUGS-59868 New disruption monitoring reporting 3 bars of disruption during kube-apiserver progressing

links to

openshift/origin#29710: OCPBUGS-55238: spyglass: hide disruption events for localhost

openshift/origin#30034: Revert "OCPBUGS-55238: spyglass: hide disruption events for localhost"

RHBA-2025:12341 OpenShift Container Platform 4.19.7 bug fix update

Assignee:: Unassigned

Reporter:: Devan Goodwin

Need Info From:: None

Contributors:: None

QA Contact:: Ke Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/04/22 5:15 PM

Updated:: 2025/09/19 4:11 PM

Resolved:: 2025/08/05 5:44 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates