Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17.0
Component/s: Test Framework
Labels:
None

Severity:
Low
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.17.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The new test: [sig-node] kubelet metrics endpoints should always be reachable

Is picking up some upgrade job runs where we see the metrics endpoint go down for about 30 seconds, during the generic node update phase, and recover before we reboot the node. This is treated as a reason to flake the test because there was no overlap with reboot as initially written.

Example: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade/1806142925785010176
Interval chart showing the problem: https://sippy.dptools.openshift.org/sippy-ng/job_runs/1806142925785010176/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade/intervals?filterText=master-1&intervalFile=e2e-timelines_spyglass_20240627-024633.json&overrideDisplayFlag=0&selectedSources=E2EFailed&selectedSources=MetricsEndpointDown&selectedSources=NodeState

The master outage at 3:30:59 is causing a flake when I'd rather it didn't, because it doesn't extend into the reboot.

I'd like to tighten this up to include any overlap with update.

Will be backported to 4.16 to tighten the signal there as well.

blocks

OCPBUGS-36744 New kubelet metrics test should ignore outages during node update, not just reboot

MODIFIED

is cloned by

OCPBUGS-36744 New kubelet metrics test should ignore outages during node update, not just reboot

MODIFIED

relates to

OCPBUGS-35371 Kubelet metrics endpoints experiencing prolonged outages

ASSIGNED

links to

openshift/origin#28911: OCPBUGS-36263: Expand allowance for kubelet metrics api endpoint outages during node upgrades

Assignee:: Devan Goodwin

Reporter:: Devan Goodwin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/06/27 11:45 AM

Updated:: 2024/10/02 9:58 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates