Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- aws
- ovn-ci
- upgrade-ci

Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
OVN As Default CI Coverage
[QE] How to address?:
---

Cost of Delay:
0
WSJF:
0

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

job link

must-gather

e2e log

this is coming in the test case about "Check if alerts are firing during or after upgrade success" and the test log snippet is here:

{May  4 09:57:07.721: Unexpected alerts fired or pending during the upgrade:

alert ClusterOperatorDown fired for 1290 seconds with labels: {endpoint="metrics", instance="10.0.184.18:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-6f9db9dd74-bfnzr", service="cluster-version-operator", severity="critical", version="4.10.12"}
alert KubePodNotReady fired for 1650 seconds with labels: {namespace="openshift-monitoring", pod="alertmanager-main-0", severity="warning"}
alert KubeStatefulSetReplicasMismatch fired for 1170 seconds with labels: {container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", service="kube-state-metrics", severity="warning", statefulset="alertmanager-main"} Failure May  4 09:57:07.721: Unexpected alerts fired or pending during the upgrade:

alert ClusterOperatorDown fired for 1290 seconds with labels: {endpoint="metrics", instance="10.0.184.18:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-6f9db9dd74-bfnzr", service="cluster-version-operator", severity="critical", version="4.10.12"}
alert KubePodNotReady fired for 1650 seconds with labels: {namespace="openshift-monitoring", pod="alertmanager-main-0", severity="warning"}
alert KubeStatefulSetReplicasMismatch fired for 1170 seconds with labels: {container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", service="kube-state-metrics", severity="warning", statefulset="alertmanager-main"}

github.com/openshift/origin/test/extended/util/disruption.(*chaosMonkeyAdapter).Test(0xc001afb0e0, 0xc003111068)
	github.com/openshift/origin/test/extended/util/disruption/disruption.go:192 +0x32f
k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1()
	k8s.io/kubernetes@v1.23.0/test/e2e/chaosmonkey/chaosmonkey.go:90 +0x6a
created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do
	k8s.io/kubernetes@v1.23.0/test/e2e/chaosmonkey/chaosmonkey.go:87 +0x8c}

The pod mentioned above is for CVO. and you can see the period it was down in the intervals graph
at the top of the prow job page. It was down for ~20m and was before the node reboot part of the
upgrades started happening. looking at the 'oc get pods' output from the gather-extra artifacts
you can see the new (upgraded) CVO pod is up and no restarts. So, something probably was
breaking/broken during the initial upgrade process.

link to this job's testgrid for reference.

relates to

CFE-564 Sprint-223 SDN embedding work - Arkadeep

Closed

Assignee:: Arkadeep Sen

Reporter:: Jamo Luhrsen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/05/04 9:48 PM

Updated:: 2022/08/29 12:06 PM

Resolved:: 2022/08/17 5:05 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates