Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: 4.22.0
Affects Version/s: 4.22
Component/s: Cloud Compute / Cluster API Providers
Labels:
- component-regression

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.22.0
Release Blocker:
Approved
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Summary

Since Feb 12, the capi-controllers and capi-operator pods crash-loop on every run of periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6-techpreview. This causes a cascade of failures:

ClusterOperatorDegraded alert fires for ~1.5h
Pathological Back-off restarting events detected
Cluster stability checks fail
Operators reported as not on the cluster version

18 consecutive failures since Feb 12 with 0 passes. The last passing run was Feb 10.

This is NOT a recurrence of the same code change as OCPBUGS-74161 (cluster-api-provider-metal3#57 was reverted Jan 21 and no new PRs have merged in that repo since). The root cause is different.

Error Output

All failures show the same pattern:

32 events happened too frequently
event [namespace/openshift-cluster-api pod/capi-controllers-* - Back-off restarting failed container]
event [namespace/openshift-cluster-api-operator pod/capi-operator-* - Back-off restarting failed container]

ClusterOperatorDegraded alert:

ClusterOperatorDegraded was at or above info for at least 1h29m on metal/amd64/ovn/ha (maxAllowed=1s): firing for 1h29m

Affected Variant

Platform: metal
Network: ovn (IPv6)
FeatureSet: techpreview
Topology: ha
Architecture: amd64
Installer: ipi
Upgrade: none

Regression Details

Regression ID	Test Name	Max Failures
35504	pathological event: excessive Back-off restarting	17
35506	verify the cluster readiness and stability	23
35503	operators on the cluster version	17
35500	ClusterOperatorDegraded alert	18
35482	container restarts in ns/openshift-cluster-api	open
35502	container restarts in ns/openshift-cluster-api-operator	open

Timeline

Last passing run: 2026-02-10T20:21:33Z
First failing run: 2026-02-12T10:58:08Z
First failing payload: 4.22.0-0.nightly-2026-02-12-105448
Regression opened in Component Readiness: 2026-02-14 to 2026-02-15

Payload Analysis

Payload 4.22.0-0.nightly-2026-02-12-105448 contained 9 new PRs. None directly modify cluster-api-provider-metal3. No new PRs have been merged in openshift/cluster-api-provider-metal3 since the Jan 21 revert (PR #60).

OCPBUGS-74161 - Previous occurrence caused by cluster-api-provider-metal3#57 (reverted). This is a different root cause.
Sample failed run with 32 pathological events

links to

openshift/cluster-capi-operator#463: OCPBUGS-77036,OCPCLOUD-3342: provider images disconnected support

Assignee:: Theo Barber-Bany

Reporter:: Devan Goodwin

QA Contact:: Zhaohua Sun

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2026/02/18 2:26 PM

Updated:: 2026/02/24 12:58 PM

capi-controllers and capi-operator pods crash-looping on metal/ipv6/techpreview since Feb 12

Summary

Error Output

Affected Variant

Regression Details

Timeline

Payload Analysis

Related

Details

Description

Summary

Error Output

Affected Variant

Regression Details

Timeline

Payload Analysis

Related

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates