Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.17.z
Affects Version/s: 4.17, 4.18.0
Component/s: Cloud Compute / OpenStack Provider
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
Yes

Target Backport Versions:

4.17.z
Target Version:

4.17.z
Release Blocker:
Rejected
Sprint:
ShiftStack Sprint 259, ShiftStack Sprint 260
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Test Coverage:

+

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, when you deployed an {product-title} cluster that listed a `TechPreviewNoUpgrade` feature gate in its configured on {rh-openstack}, the `cluster-capi-operator` pod crashed. This occurred because the Cluster CAPI Operator expected a different API version than the one that was served. With this release, an update to the Cluster CAPI Operator ensures that the Operator uses the correct version of the API so that this issue no longer occurs. (link:https://issues.redhat.com/browse/OCPBUGS-41576[*~~OCPBUGS-41576~~*])

Show
* Previously, when you deployed an {product-title} cluster that listed a `TechPreviewNoUpgrade` feature gate in its configured on {rh-openstack}, the `cluster-capi-operator` pod crashed. This occurred because the Cluster CAPI Operator expected a different API version than the one that was served. With this release, an update to the Cluster CAPI Operator ensures that the Operator uses the correct version of the API so that this issue no longer occurs. (link: https://issues.redhat.com/browse/OCPBUGS-41576 [* OCPBUGS-41576 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-38006~~. The following is the description of the original issue:
—
Description of problem:

    sometimes cluster-capi-operator pod stuck in CrashLoopBackOff on osp

Version-Release number of selected component (if applicable):

4.17.0-0.nightly-2024-08-01-213905

How reproducible:

    Sometimes

Steps to Reproduce:

    1.Create an osp cluster with TechPreviewNoUpgrade
    2.Check cluster-capi-operator pod
    3.

Actual results:

cluster-capi-operator pod in CrashLoopBackOff status
$ oc get po                               
cluster-capi-operator-74dfcfcb9d-7gk98          0/1     CrashLoopBackOff   6 (2m54s ago)   41m

$ oc get po         
cluster-capi-operator-74dfcfcb9d-7gk98          1/1     Running   7 (7m52s ago)   46m

$ oc get po                                                               
cluster-capi-operator-74dfcfcb9d-7gk98          0/1     CrashLoopBackOff   7 (2m24s ago)   50m

E0806 03:44:00.584669       1 kind.go:66] "kind must be registered to the Scheme" err="no kind is registered for the type v1alpha7.OpenStackCluster in scheme \"github.com/openshift/cluster-capi-operator/cmd/cluster-capi-operator/main.go:86\"" logger="controller-runtime.source.EventHandler"
E0806 03:44:00.685539       1 controller.go:203] "Could not wait for Cache to sync" err="failed to wait for clusteroperator caches to sync: timed out waiting for cache to be synced for Kind *v1alpha7.OpenStackCluster" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator"
I0806 03:44:00.685610       1 internal.go:516] "Stopping and waiting for non leader election runnables"
I0806 03:44:00.685620       1 internal.go:520] "Stopping and waiting for leader election runnables"
I0806 03:44:00.685646       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="secret" controllerGroup="" controllerKind="Secret"
I0806 03:44:00.685706       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster"
I0806 03:44:00.685712       1 controller.go:242] "All workers finished" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster"
I0806 03:44:00.685717       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="secret" controllerGroup="" controllerKind="Secret"
I0806 03:44:00.685722       1 controller.go:242] "All workers finished" controller="secret" controllerGroup="" controllerKind="Secret"
I0806 03:44:00.685718       1 controller.go:242] "All workers finished" controller="secret" controllerGroup="" controllerKind="Secret"
I0806 03:44:00.685720       1 controller.go:240] "Shutdown signal received, waiting for all workers to finish" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator"
I0806 03:44:00.685823       1 recorder_in_memory.go:80] &Event{ObjectMeta:{dummy.17e906d425f7b2e1  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:CustomResourceDefinitionUpdateFailed,Message:Failed to update CustomResourceDefinition.apiextensions.k8s.io/openstackclusters.infrastructure.cluster.x-k8s.io: Put "https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/openstackclusters.infrastructure.cluster.x-k8s.io": context canceled,Source:EventSource{Component:cluster-capi-operator-capi-installer-apply-client,Host:,},FirstTimestamp:2024-08-06 03:44:00.685748961 +0000 UTC m=+302.946052179,LastTimestamp:2024-08-06 03:44:00.685748961 +0000 UTC m=+302.946052179,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
I0806 03:44:00.719743       1 capi_installer_controller.go:309] "CAPI Installer Controller is Degraded" logger="CapiInstallerController" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="6fa96361-4dc2-4865-b1b3-f92378c002cc"
E0806 03:44:00.719942       1 controller.go:329] "Reconciler error" err="error during reconcile: failed to set conditions for CAPI Installer controller: failed to sync status: failed to update cluster operator status: client rate limiter Wait returned an error: context canceled" controller="clusteroperator" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="6fa96361-4dc2-4865-b1b3-f92378c002cc"

Expected results:

    cluster-capi-operator pod is always Running

Additional info:

depends on

OCPBUGS-38006 [capi] sometimes cluster-capi-operator pod stuck in CrashLoopBackOff on osp

Closed

links to

openshift/cluster-capi-operator#205: OCPBUGS-41576: Use CAPO v0.10 and API v1beta1

RHBA-2024:7922 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Martin André

Reporter:: OpenShift Prow Bot

Need Info From:: None

Contributors:: None

QA Contact:: Itshak Brown

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/09/10 6:16 AM

Updated:: 2025/07/21 5:33 PM

Resolved:: 2024/10/16 2:42 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide