Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41576

[capi] sometimes cluster-capi-operator pod stuck in CrashLoopBackOff on osp

XMLWordPrintable

    • +
    • Critical
    • Yes
    • ShiftStack Sprint 259, ShiftStack Sprint 260
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when you deployed an {product-title} cluster that listed a `TechPreviewNoUpgrade` feature gate in its configured on {rh-openstack}, the `cluster-capi-operator` pod crashed. This occurred because the Cluster CAPI Operator expected a different API version than the one that was served. With this release, an update to the Cluster CAPI Operator ensures that the Operator uses the correct version of the API so that this issue no longer occurs. (link:https://issues.redhat.com/browse/OCPBUGS-41576[*OCPBUGS-41576*])
      Show
      * Previously, when you deployed an {product-title} cluster that listed a `TechPreviewNoUpgrade` feature gate in its configured on {rh-openstack}, the `cluster-capi-operator` pod crashed. This occurred because the Cluster CAPI Operator expected a different API version than the one that was served. With this release, an update to the Cluster CAPI Operator ensures that the Operator uses the correct version of the API so that this issue no longer occurs. (link: https://issues.redhat.com/browse/OCPBUGS-41576 [* OCPBUGS-41576 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-38006. The following is the description of the original issue:

      Description of problem:

          sometimes cluster-capi-operator pod stuck in CrashLoopBackOff on osp

      Version-Release number of selected component (if applicable):

      4.17.0-0.nightly-2024-08-01-213905    

      How reproducible:

          Sometimes

      Steps to Reproduce:

          1.Create an osp cluster with TechPreviewNoUpgrade
          2.Check cluster-capi-operator pod
          3.
          

      Actual results:

      cluster-capi-operator pod in CrashLoopBackOff status
      $ oc get po                               
      cluster-capi-operator-74dfcfcb9d-7gk98          0/1     CrashLoopBackOff   6 (2m54s ago)   41m
      
      $ oc get po         
      cluster-capi-operator-74dfcfcb9d-7gk98          1/1     Running   7 (7m52s ago)   46m
      
      $ oc get po                                                               
      cluster-capi-operator-74dfcfcb9d-7gk98          0/1     CrashLoopBackOff   7 (2m24s ago)   50m
      
      E0806 03:44:00.584669       1 kind.go:66] "kind must be registered to the Scheme" err="no kind is registered for the type v1alpha7.OpenStackCluster in scheme \"github.com/openshift/cluster-capi-operator/cmd/cluster-capi-operator/main.go:86\"" logger="controller-runtime.source.EventHandler"
      E0806 03:44:00.685539       1 controller.go:203] "Could not wait for Cache to sync" err="failed to wait for clusteroperator caches to sync: timed out waiting for cache to be synced for Kind *v1alpha7.OpenStackCluster" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator"
      I0806 03:44:00.685610       1 internal.go:516] "Stopping and waiting for non leader election runnables"
      I0806 03:44:00.685620       1 internal.go:520] "Stopping and waiting for leader election runnables"
      I0806 03:44:00.685646       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685706       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster"
      I0806 03:44:00.685712       1 controller.go:242] "All workers finished" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster"
      I0806 03:44:00.685717       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685722       1 controller.go:242] "All workers finished" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685718       1 controller.go:242] "All workers finished" controller="secret" controllerGroup="" controllerKind="Secret"
      I0806 03:44:00.685720       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator"
      I0806 03:44:00.685823       1 recorder_in_memory.go:80] &Event{ObjectMeta:{dummy.17e906d425f7b2e1  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:CustomResourceDefinitionUpdateFailed,Message:Failed to update CustomResourceDefinition.apiextensions.k8s.io/openstackclusters.infrastructure.cluster.x-k8s.io: Put "https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/openstackclusters.infrastructure.cluster.x-k8s.io": context canceled,Source:EventSource{Component:cluster-capi-operator-capi-installer-apply-client,Host:,},FirstTimestamp:2024-08-06 03:44:00.685748961 +0000 UTC m=+302.946052179,LastTimestamp:2024-08-06 03:44:00.685748961 +0000 UTC m=+302.946052179,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
      I0806 03:44:00.719743       1 capi_installer_controller.go:309] "CAPI Installer Controller is Degraded" logger="CapiInstallerController" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="6fa96361-4dc2-4865-b1b3-f92378c002cc"
      E0806 03:44:00.719942       1 controller.go:329] "Reconciler error" err="error during reconcile: failed to set conditions for CAPI Installer controller: failed to sync status: failed to update cluster operator status: client rate limiter Wait returned an error: context canceled" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="6fa96361-4dc2-4865-b1b3-f92378c002cc"

      Expected results:

          cluster-capi-operator pod is always Running

      Additional info:

          

            maandre@redhat.com Martin André
            openshift-crt-jira-prow OpenShift Prow Bot
            Itshak Brown Itshak Brown
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: