Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38006

[capi] sometimes cluster-capi-operator pod stuck in CrashLoopBackOff on osp

XMLWordPrintable

    • +
    • Critical
    • Yes
    • ShiftStack Sprint 259, ShiftStack Sprint 260
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

          sometimes cluster-capi-operator pod stuck in CrashLoopBackOff on osp

      Version-Release number of selected component (if applicable):

      4.17.0-0.nightly-2024-08-01-213905    

      How reproducible:

          Sometimes

      Steps to Reproduce:

          1.Create an osp cluster with TechPreviewNoUpgrade
          2.Check cluster-capi-operator pod
          3.
          

      Actual results:

      cluster-capi-operator pod in CrashLoopBackOff status
      $ oc get po                               
      cluster-capi-operator-74dfcfcb9d-7gk98          0/1     CrashLoopBackOff   6 (2m54s ago)   41m
      
      $ oc get po         
      cluster-capi-operator-74dfcfcb9d-7gk98          1/1     Running   7 (7m52s ago)   46m
      
      $ oc get po                                                               
      cluster-capi-operator-74dfcfcb9d-7gk98          0/1     CrashLoopBackOff   7 (2m24s ago)   50m
      
      E0806 03:44:00.584669       1 kind.go:66] "kind must be registered to the Scheme" err="no kind is registered for the type v1alpha7.OpenStackCluster in scheme \"github.com/openshift/cluster-capi-operator/cmd/cluster-capi-operator/main.go:86\"" logger="controller-runtime.source.EventHandler"
      E0806 03:44:00.685539       1 controller.go:203] "Could not wait for Cache to sync" err="failed to wait for clusteroperator caches to sync: timed out waiting for cache to be synced for Kind *v1alpha7.OpenStackCluster" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator"
      I0806 03:44:00.685610       1 internal.go:516] "Stopping and waiting for non leader election runnables"
      I0806 03:44:00.685620       1 internal.go:520] "Stopping and waiting for leader election runnables"
      I0806 03:44:00.685646       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685706       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster"
      I0806 03:44:00.685712       1 controller.go:242] "All workers finished" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster"
      I0806 03:44:00.685717       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685722       1 controller.go:242] "All workers finished" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685718       1 controller.go:242] "All workers finished" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685720       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator"
      I0806 03:44:00.685823       1 recorder_in_memory.go:80] &Event{ObjectMeta:{dummy.17e906d425f7b2e1  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:CustomResourceDefinitionUpdateFailed,Message:Failed to update CustomResourceDefinition.apiextensions.k8s.io/openstackclusters.infrastructure.cluster.x-k8s.io: Put "https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/openstackclusters.infrastructure.cluster.x-k8s.io": context canceled,Source:EventSource{Component:cluster-capi-operator-capi-installer-apply-client,Host:,},FirstTimestamp:2024-08-06 03:44:00.685748961 +0000 UTC m=+302.946052179,LastTimestamp:2024-08-06 03:44:00.685748961 +0000 UTC m=+302.946052179,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
      I0806 03:44:00.719743       1 capi_installer_controller.go:309] "CAPI Installer Controller is Degraded" logger="CapiInstallerController" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="6fa96361-4dc2-4865-b1b3-f92378c002cc"
      E0806 03:44:00.719942       1 controller.go:329] "Reconciler error" err="error during reconcile: failed to set conditions for CAPI Installer controller: failed to sync status: failed to update cluster operator status: client rate limiter Wait returned an error: context canceled" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="6fa96361-4dc2-4865-b1b3-f92378c002cc"

      Expected results:

          cluster-capi-operator pod is always Running

      Additional info:

          

            maandre@redhat.com Martin André
            rhn-support-zhsun Zhaohua Sun
            Itshak Brown Itshak Brown
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: