Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18569

CNO pod restart in hypershift CI

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      We are seeing flakes on CNO pod restarts flake in hypershift CI on the hypershift control plane

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2967/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1699008879737704448/artifacts/e2e-kubevirt-aws-ovn/run-e2e-local/artifacts/TestCreateCluster/namespaces/e2e-clusters-pvhd5-example-s6skm/core/pods/logs/cluster-network-operator-78fd774c97-7w7dg-cluster-network-operator-previous.log

      W0905 11:42:53.359515       1 builder.go:106] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'
      

      The current backoff is set to retry.DefaultBackoff which is appropriate for 409 conflicts and only retries for < 1s

      var DefaultBackoff = wait.Backoff{
      	Steps:    4,
      	Duration: 10 * time.Millisecond,
      	Factor:   5.0,
      	Jitter:   0.1,
      }
      

      Elsewhere in the codebase, retry.DefaultBackoff is used with retry.RetryOnConflict() where it is appropriate, but we need to retry for much longer here and much less frequently.

            sjenning Seth Jennings
            sjenning Seth Jennings
            Jean Chen Jean Chen
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: