Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22286

CNO pod restart in hypershift CI

XMLWordPrintable

    • No
    • SDN Sprint 244, SDN Sprint 245
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-18569. The following is the description of the original issue:

      We are seeing flakes on CNO pod restarts flake in hypershift CI on the hypershift control plane

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2967/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1699008879737704448/artifacts/e2e-kubevirt-aws-ovn/run-e2e-local/artifacts/TestCreateCluster/namespaces/e2e-clusters-pvhd5-example-s6skm/core/pods/logs/cluster-network-operator-78fd774c97-7w7dg-cluster-network-operator-previous.log

      W0905 11:42:53.359515       1 builder.go:106] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'
      

      The current backoff is set to retry.DefaultBackoff which is appropriate for 409 conflicts and only retries for < 1s

      var DefaultBackoff = wait.Backoff{
      	Steps:    4,
      	Duration: 10 * time.Millisecond,
      	Factor:   5.0,
      	Jitter:   0.1,
      }
      

      Elsewhere in the codebase, retry.DefaultBackoff is used with retry.RetryOnConflict() where it is appropriate, but we need to retry for much longer here and much less frequently.

            pdiak@redhat.com Patryk Diak
            openshift-crt-jira-prow OpenShift Prow Bot
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: