-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
There is an occasional flake in the e2e-agnostic-operator job:
--- FAIL: TestIntegrationCVO_gracefulStepDown (0.63s) start_integration_test.go:337: the controller should create a lock record on a config map start_integration_test.go:361: verify the controller writes a leadership change event start_integration_test.go:367: no leader election events found in []v1.Event(nil)
There are some errors in the content logged by the CVO under test but unsure whether they are related to the failure:
E0516 23:51:38.827284 20844 event.go:368] "Unable to write event (may retry after sleeping)" err="can't create an event with namespace 'openshift-cluster-version' in namespace 'e2e-cvo-trw6pv'" event="&Event{ObjectMeta:{version.184026f74212e7e6 openshift-cluster-version 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:ClusterVersion,Namespace:openshift-cluster-version,Name:version,UID:,APIVersion:config.openshift.io/v1,ResourceVersion:,FieldPath:,},Reason:RetrievePayload,Message:Retrieving and verifying payload version=\"0.0.1\" image=\"arbitrary/release:image\",Source:EventSource{Component:e2e-cvo-trw6pv,Host:,},FirstTimestamp:2025-05-16 23:51:38.827065318 +0000 UTC m=+1.446833464,LastTimestamp:2025-05-16 23:51:38.827065318 +0000 UTC m=+1.446833464,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:e2e-cvo-trw6pv,ReportingInstance:,}" ... E0516 23:51:38.827370 20844 event.go:368] "Unable to write event (may retry after sleeping)" err="can't create an event with namespace 'openshift-cluster-version' in namespace 'e2e-cvo-j4p4'" event="&Event{ObjectMeta:{version.184026f742135add openshift-cluster-version 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] ... E0516 23:51:38.898947 20844 task.go:128] "Unhandled Error" err="error running apply for configmap \"e2e-cvo-trw6pv/config1\" (1 of 2): Get \"https://api.ci-op-5q7vt5nz-bed9b.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-cvo-trw6pv/configmaps/config1\": context canceled" logger="UnhandledError" E0516 23:51:38.898951 20844 task.go:128] "Unhandled Error" err="error running apply for configmap \"e2e-cvo-trw6pv/config2\" (2 of 2): Get \"https://api.ci-op-5q7vt5nz-bed9b.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-cvo-trw6pv/configmaps/config2\": context canceled" logger="UnhandledError"
Version-Release number of selected component (if applicable):
Earliest occurrence is on a 4.19 presubmit in PR1164 (which is almost certainly unrelated)
How reproducible:
Not very much, found just 8 occurrences in the current CI presubmit history (afaik test platform keeps ~3mo of CI job artifacts):
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1196/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1923508870458642432 Executed: 2025-05-17 00:40:40 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1190/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1923374535944441856 Executed: 2025-05-16 15:46:49 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1188/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1920163842013270016 Executed: 2025-05-07 19:08:42 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1170/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1919720203562782720 Executed: 2025-05-06 13:45:50 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1165/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1912329833501691904 Executed: 2025-04-16 04:19:08 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1176/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1912177036063936512 Executed: 2025-04-15 18:12:00 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1172/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1907439871123787776 Executed: 2025-04-02 16:28:13 UTC
- https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1164/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/1895160731263832064 Executed: 2025-02-27 18:15:38 UTC
Additional info:
The e2e-agnostic-operator job runs the CVO tests that needs a live cluster to work so it is not practical to try go test -count to reproduce.
I initially suspected OTA-1531-related work to cause this flake, but there are occurrences from before the first related code change merged on May 6 and there are occurrences from before that so it is likely unrelated.