-
Story
-
Resolution: Not a Bug
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
In [this job|https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1331/pull-ci-openshift-ovn-kubernetes-master-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1583427008199659520), we see spyglass matching these times:
$ cat e2e-events_20221021-130143.json |jq '.items[]|select(.locator|test("ExtremelyHighIndividualControlPlaneCPU"))' { "level": "Info", "locator": "alert/ExtremelyHighIndividualControlPlaneCPU node/ip-10-0-165-15.ec2.internal ns/openshift-kube-apiserver", "message": "ALERTS{alertname=\"ExtremelyHighIndividualControlPlaneCPU\", alertstate=\"pending\", instance=\"ip-10-0-165-15.ec2.internal\", namespace=\"openshift-kube-apiserver\", prometheus=\"openshift-monitoring/k8s\", severity=\"critical\"}", "from": "2022-10-21T13:21:33Z", "to": "2022-10-21T13:26:33Z" } { "level": "Warning", "locator": "alert/ExtremelyHighIndividualControlPlaneCPU node/ip-10-0-165-15.ec2.internal ns/openshift-kube-apiserver", "message": "ALERTS{alertname=\"ExtremelyHighIndividualControlPlaneCPU\", alertstate=\"firing\", instance=\"ip-10-0-165-15.ec2.internal\", namespace=\"openshift-kube-apiserver\", prometheus=\"openshift-monitoring/k8s\", severity=\"warning\"}", "from": "2022-10-21T13:26:33Z", "to": "2022-10-21T13:39:01Z" }
yet the job shows:
: [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] expand_less1h23m28s{ fail [github.com/openshift/origin/test/extended/util/disruption/disruption.go:197]: Oct 21 14:23:24.339: Unexpected alerts fired or pending during the upgrade: alert ExtremelyHighIndividualControlPlaneCPU fired for 750 seconds with labels: {instance="ip-10-0-165-15.ec2.internal", namespace="openshift-kube-apiserver", severity="warning"} Ginkgo exit error 1: exit with code 1}
i.e., the junit xml says 13:26:33 and the prow output says 14:23:24. That's an hour difference. I feel that 13:26:33 is closer because 14:23 is at the end of the chart.
- is related to
-
TRT-595 Improve Cluster Alert Tests
- Closed