-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
None
-
None
job link
this is coming in the test case about "Check if alerts are firing during or after upgrade success" and the test log snippet is here:
{May 4 09:57:07.721: Unexpected alerts fired or pending during the upgrade: alert ClusterOperatorDown fired for 1290 seconds with labels: {endpoint="metrics", instance="10.0.184.18:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-6f9db9dd74-bfnzr", service="cluster-version-operator", severity="critical", version="4.10.12"} alert KubePodNotReady fired for 1650 seconds with labels: {namespace="openshift-monitoring", pod="alertmanager-main-0", severity="warning"} alert KubeStatefulSetReplicasMismatch fired for 1170 seconds with labels: {container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", service="kube-state-metrics", severity="warning", statefulset="alertmanager-main"} Failure May 4 09:57:07.721: Unexpected alerts fired or pending during the upgrade: alert ClusterOperatorDown fired for 1290 seconds with labels: {endpoint="metrics", instance="10.0.184.18:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-6f9db9dd74-bfnzr", service="cluster-version-operator", severity="critical", version="4.10.12"} alert KubePodNotReady fired for 1650 seconds with labels: {namespace="openshift-monitoring", pod="alertmanager-main-0", severity="warning"} alert KubeStatefulSetReplicasMismatch fired for 1170 seconds with labels: {container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", service="kube-state-metrics", severity="warning", statefulset="alertmanager-main"} github.com/openshift/origin/test/extended/util/disruption.(*chaosMonkeyAdapter).Test(0xc001afb0e0, 0xc003111068) github.com/openshift/origin/test/extended/util/disruption/disruption.go:192 +0x32f k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1() k8s.io/kubernetes@v1.23.0/test/e2e/chaosmonkey/chaosmonkey.go:90 +0x6a created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do k8s.io/kubernetes@v1.23.0/test/e2e/chaosmonkey/chaosmonkey.go:87 +0x8c}
The pod mentioned above is for CVO. and you can see the period it was down in the intervals graph
at the top of the prow job page. It was down for ~20m and was before the node reboot part of the
upgrades started happening. looking at the 'oc get pods' output from the gather-extra artifacts
you can see the new (upgraded) CVO pod is up and no restarts. So, something probably was
breaking/broken during the initial upgrade process.
link to this job's testgrid for reference.
- relates to
-
CFE-564 Sprint-223 SDN embedding work - Arkadeep
- Closed