Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Epic Link:
OVN As Default CI Coverage

Target Version:
None
Release Blocker:
None
Sprint:
SDN Sprint 219

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

job link

must-gather

this is coming in the test case about "Check if alerts are firing during or after upgrade success" and the test log snippet is here:

{May  4 09:58:01.856: Unexpected alerts fired or pending during the upgrade:

alert TargetDown fired for 30 seconds with labels: {job="dns-default", namespace="openshift-dns", service="dns-default", severity="warning"}
alert TargetDown fired for 60 seconds with labels: {job="network-metrics-service", namespace="openshift-multus", service="network-metrics-service", severity="warning"}
alert TargetDown fired for 60 seconds with labels: {job="ovnkube-node", namespace="openshift-ovn-kubernetes", service="ovn-kubernetes-node", severity="warning"}
alert TargetDown fired for 90 seconds with labels: {job="node-exporter", namespace="openshift-monitoring", service="node-exporter", severity="warning"} Failure May  4 09:58:01.856: Unexpected alerts fired or pending during the upgrade:

alert TargetDown fired for 30 seconds with labels: {job="dns-default", namespace="openshift-dns", service="dns-default", severity="warning"}
alert TargetDown fired for 60 seconds with labels: {job="network-metrics-service", namespace="openshift-multus", service="network-metrics-service", severity="warning"}
alert TargetDown fired for 60 seconds with labels: {job="ovnkube-node", namespace="openshift-ovn-kubernetes", service="ovn-kubernetes-node", severity="warning"}
alert TargetDown fired for 90 seconds with labels: {job="node-exporter", namespace="openshift-monitoring", service="node-exporter", severity="warning"}

github.com/openshift/origin/test/extended/util/disruption.(*chaosMonkeyAdapter).Test(0xc001790e10, 0xc0009c3110)
	github.com/openshift/origin/test/extended/util/disruption/disruption.go:192 +0x32f
k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1()
	k8s.io/kubernetes@v1.23.0/test/e2e/chaosmonkey/chaosmonkey.go:90 +0x6a
created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do
	k8s.io/kubernetes@v1.23.0/test/e2e/chaosmonkey/chaosmonkey.go:87 +0x8c}

["oc get pods"|] that is captured at the end of the job looks like it might show that the pods related to the
above alerts have some restart counts that we might not expect. It appears that the alerts started firing
toward the end of the final node was coming back up from it's upgrade reboot (FWIW).

link to this job's testgrid for reference.

Assignee:: Mohamed Mahmoud (Inactive)

Reporter:: Jamo Luhrsen

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/05/04 11:15 PM

Updated:: 2025/07/30 5:43 AM

Resolved:: 2022/06/01 5:32 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty