Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77036

capi-controllers and capi-operator pods crash-looping on metal/ipv6/techpreview since Feb 12

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Approved
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      Since Feb 12, the capi-controllers and capi-operator pods crash-loop on every run of periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6-techpreview. This causes a cascade of failures:

      • ClusterOperatorDegraded alert fires for ~1.5h
      • Pathological Back-off restarting events detected
      • Cluster stability checks fail
      • Operators reported as not on the cluster version

      18 consecutive failures since Feb 12 with 0 passes. The last passing run was Feb 10.

      This is NOT a recurrence of the same code change as OCPBUGS-74161 (cluster-api-provider-metal3#57 was reverted Jan 21 and no new PRs have merged in that repo since). The root cause is different.

      Error Output

      All failures show the same pattern:

      32 events happened too frequently
      event [namespace/openshift-cluster-api pod/capi-controllers-* - Back-off restarting failed container]
      event [namespace/openshift-cluster-api-operator pod/capi-operator-* - Back-off restarting failed container]
      

      ClusterOperatorDegraded alert:

      ClusterOperatorDegraded was at or above info for at least 1h29m on metal/amd64/ovn/ha (maxAllowed=1s): firing for 1h29m
      

      Affected Variant

      • Platform: metal
      • Network: ovn (IPv6)
      • FeatureSet: techpreview
      • Topology: ha
      • Architecture: amd64
      • Installer: ipi
      • Upgrade: none

      Regression Details

      Regression ID Test Name Max Failures
      35504 pathological event: excessive Back-off restarting 17
      35506 verify the cluster readiness and stability 23
      35503 operators on the cluster version 17
      35500 ClusterOperatorDegraded alert 18
      35482 container restarts in ns/openshift-cluster-api open
      35502 container restarts in ns/openshift-cluster-api-operator open

      Timeline

      • Last passing run: 2026-02-10T20:21:33Z
      • First failing run: 2026-02-12T10:58:08Z
      • First failing payload: 4.22.0-0.nightly-2026-02-12-105448
      • Regression opened in Component Readiness: 2026-02-14 to 2026-02-15

      Payload Analysis

      Payload 4.22.0-0.nightly-2026-02-12-105448 contained 9 new PRs. None directly modify cluster-api-provider-metal3. No new PRs have been merged in openshift/cluster-api-provider-metal3 since the Jan 21 revert (PR #60).

      Related

              rh-ee-tbarberb Theo Barber-Bany
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: