Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56872

TestIntegrationCVO_gracefulStepDown flakes in e2e-agnostic-operator CI job

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      There is an occasional flake in the e2e-agnostic-operator job:

       --- FAIL: TestIntegrationCVO_gracefulStepDown (0.63s)
          start_integration_test.go:337: the controller should create a lock record on a config map
          start_integration_test.go:361: verify the controller writes a leadership change event
          start_integration_test.go:367: no leader election events found in
              []v1.Event(nil)
      

      There are some errors in the content logged by the CVO under test but unsure whether they are related to the failure:

      E0516 23:51:38.827284   20844 event.go:368] "Unable to write event (may retry after sleeping)" err="can't create an event with namespace 'openshift-cluster-version' in namespace 'e2e-cvo-trw6pv'" event="&Event{ObjectMeta:{version.184026f74212e7e6  openshift-cluster-version    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:ClusterVersion,Namespace:openshift-cluster-version,Name:version,UID:,APIVersion:config.openshift.io/v1,ResourceVersion:,FieldPath:,},Reason:RetrievePayload,Message:Retrieving and verifying payload version=\"0.0.1\" image=\"arbitrary/release:image\",Source:EventSource{Component:e2e-cvo-trw6pv,Host:,},FirstTimestamp:2025-05-16 23:51:38.827065318 +0000 UTC m=+1.446833464,LastTimestamp:2025-05-16 23:51:38.827065318 +0000 UTC m=+1.446833464,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:e2e-cvo-trw6pv,ReportingInstance:,}"
      ...
      E0516 23:51:38.827370   20844 event.go:368] "Unable to write event (may retry after sleeping)" err="can't create an event with namespace 'openshift-cluster-version' in namespace 'e2e-cvo-j4p4'" event="&Event{ObjectMeta:{version.184026f742135add  openshift-cluster-version    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []
      ...
      E0516 23:51:38.898947   20844 task.go:128] "Unhandled Error" err="error running apply for configmap \"e2e-cvo-trw6pv/config1\" (1 of 2): Get \"https://api.ci-op-5q7vt5nz-bed9b.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-cvo-trw6pv/configmaps/config1\": context canceled" logger="UnhandledError"
      E0516 23:51:38.898951   20844 task.go:128] "Unhandled Error" err="error running apply for configmap \"e2e-cvo-trw6pv/config2\" (2 of 2): Get \"https://api.ci-op-5q7vt5nz-bed9b.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-cvo-trw6pv/configmaps/config2\": context canceled" logger="UnhandledError"
      

      Version-Release number of selected component (if applicable):

      Earliest occurrence is on a 4.19 presubmit in PR1164 (which is almost certainly unrelated)

      How reproducible:

      Not very much, found just 8 occurrences in the current CI presubmit history (afaik test platform keeps ~3mo of CI job artifacts):

      Additional info:

      The e2e-agnostic-operator job runs the CVO tests that needs a live cluster to work so it is not practical to try go test -count to reproduce.

      I initially suspected OTA-1531-related work to cause this flake, but there are occurrences from before the first related code change merged on May 6 and there are occurrences from before that so it is likely unrelated.

              Unassigned Unassigned
              afri@afri.cz Petr Muller
              None
              None
              Jia Liu Jia Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: