Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22286

CNO pod restart in hypershift CI

    XMLWordPrintable

Details

    • No
    • SDN Sprint 244, SDN Sprint 245
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-18569. The following is the description of the original issue:

      We are seeing flakes on CNO pod restarts flake in hypershift CI on the hypershift control plane

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2967/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1699008879737704448/artifacts/e2e-kubevirt-aws-ovn/run-e2e-local/artifacts/TestCreateCluster/namespaces/e2e-clusters-pvhd5-example-s6skm/core/pods/logs/cluster-network-operator-78fd774c97-7w7dg-cluster-network-operator-previous.log

      W0905 11:42:53.359515       1 builder.go:106] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'
      

      The current backoff is set to retry.DefaultBackoff which is appropriate for 409 conflicts and only retries for < 1s

      var DefaultBackoff = wait.Backoff{
      	Steps:    4,
      	Duration: 10 * time.Millisecond,
      	Factor:   5.0,
      	Jitter:   0.1,
      }
      

      Elsewhere in the codebase, retry.DefaultBackoff is used with retry.RetryOnConflict() where it is appropriate, but we need to retry for much longer here and much less frequently.

      Attachments

        Issue Links

          Activity

            People

              pdiak@redhat.com Patryk Diak
              openshift-crt-jira-prow OpenShift Prow Bot
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: