Loading...

XML

Word

Printable

Type: Story
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

In [this job|https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1331/pull-ci-openshift-ovn-kubernetes-master-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1583427008199659520), we see spyglass matching these times:

$ cat e2e-events_20221021-130143.json |jq '.items[]|select(.locator|test("ExtremelyHighIndividualControlPlaneCPU"))'
{
  "level": "Info",
  "locator": "alert/ExtremelyHighIndividualControlPlaneCPU node/ip-10-0-165-15.ec2.internal ns/openshift-kube-apiserver",
  "message": "ALERTS{alertname=\"ExtremelyHighIndividualControlPlaneCPU\", alertstate=\"pending\", instance=\"ip-10-0-165-15.ec2.internal\", namespace=\"openshift-kube-apiserver\", prometheus=\"openshift-monitoring/k8s\", severity=\"critical\"}",
  "from": "2022-10-21T13:21:33Z",
  "to": "2022-10-21T13:26:33Z"
}
{
  "level": "Warning",
  "locator": "alert/ExtremelyHighIndividualControlPlaneCPU node/ip-10-0-165-15.ec2.internal ns/openshift-kube-apiserver",
  "message": "ALERTS{alertname=\"ExtremelyHighIndividualControlPlaneCPU\", alertstate=\"firing\", instance=\"ip-10-0-165-15.ec2.internal\", namespace=\"openshift-kube-apiserver\", prometheus=\"openshift-monitoring/k8s\", severity=\"warning\"}",
  "from": "2022-10-21T13:26:33Z",
  "to": "2022-10-21T13:39:01Z"
}

yet the job shows:

: [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] expand_less1h23m28s{  fail [github.com/openshift/origin/test/extended/util/disruption/disruption.go:197]: Oct 21 14:23:24.339: Unexpected alerts fired or pending during the upgrade:

alert ExtremelyHighIndividualControlPlaneCPU fired for 750 seconds with labels: {instance="ip-10-0-165-15.ec2.internal", namespace="openshift-kube-apiserver", severity="warning"}
Ginkgo exit error 1: exit with code 1}

i.e., the junit xml says 13:26:33 and the prow output says 14:23:24. That's an hour difference. I feel that 13:26:33 is closer because 14:23 is at the end of the chart.

is related to

TRT-595 Improve Cluster Alert Tests

Closed

Assignee:: Unassigned

Reporter:: Dennis Periquet

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/10/24 2:18 PM

Updated:: 2023/03/03 3:25 PM

Resolved:: 2023/03/03 3:25 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates