Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: 4.16.z
Affects Version/s: 4.17.0
Component/s: Test Framework
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
No

Target Backport Versions:
None
Target Version:

4.16.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-36263~~. The following is the description of the original issue:
—
The new test: [sig-node] kubelet metrics endpoints should always be reachable

Is picking up some upgrade job runs where we see the metrics endpoint go down for about 30 seconds, during the generic node update phase, and recover before we reboot the node. This is treated as a reason to flake the test because there was no overlap with reboot as initially written.

Example: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade/1806142925785010176
Interval chart showing the problem: https://sippy.dptools.openshift.org/sippy-ng/job_runs/1806142925785010176/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade/intervals?filterText=master-1&intervalFile=e2e-timelines_spyglass_20240627-024633.json&overrideDisplayFlag=0&selectedSources=E2EFailed&selectedSources=MetricsEndpointDown&selectedSources=NodeState

The master outage at 3:30:59 is causing a flake when I'd rather it didn't, because it doesn't extend into the reboot.

I'd like to tighten this up to include any overlap with update.

Will be backported to 4.16 to tighten the signal there as well.

clones

OCPBUGS-36263 New kubelet metrics test should ignore outages during node update, not just reboot

Closed

is blocked by

OCPBUGS-36263 New kubelet metrics test should ignore outages during node update, not just reboot

Closed

links to

openshift/origin#28928: [release-4.16] OCPBUGS-36744: Expand allowance for kubelet metrics api endpoint outages during node upgrades

Assignee:: Devan Goodwin

Reporter:: OpenShift Prow Bot

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/07/09 11:06 AM

Updated:: 2026/01/26 9:07 PM

Resolved:: 2026/01/26 9:07 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates