Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77460

capm3-controller-manager crash looping on metal

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      [Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically

      Significant regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 100.00% to 85.37%.
      Regression has been triaged to one or more bugs.

      Sample (being evaluated) Release: 4.22
      Start Time: 2026-02-20T00:00:00Z
      End Time: 2026-02-27T12:00:00Z
      Success Rate: 85.37%
      Successes: 35
      Failures: 6
      Flakes: 0
      Base (historical) Release: 4.21
      Start Time: 2026-01-04T00:00:00Z
      End Time: 2026-02-03T23:59:59Z
      Success Rate: 100.00%
      Successes: 201
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      [Monitor:legacy-node-invariants][sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers expand_less 	0s
      {  13 events happened too frequently
      
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 257 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 273 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 311 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 326 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 342 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 380 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 394 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 432 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 447 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 462 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 486 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 502 times
      event [namespace/openshift-cluster-api node/master-2 pod/capm3-controller-manager-6bfb9d7479-wd7ff hmsg/a2f5239676 - Back-off restarting failed container manager in pod capm3-controller-manager-6bfb9d7479-wd7ff_openshift-cluster-api(62132d5b-0a73-42d7-90ed-36a1b98543a1)] happened 517 times}
      

      Also [Monitor:kubelet-container-restarts][sig-architecture] platform pods in ns/openshift-cluster-api should not exit an excessive amount of times

      E0224 15:38:18.785711       1 main.go:218] "problem running manager" err="failed to wait for metal3data caches to sync kind source: *v1alpha1.IPClaim: timed out waiting for cache to be synced for Kind *v1alpha1.IPClaim" logger="setup"
      

      Filed by: dgoodwin@redhat.com

              rh-ee-nbrubake Nolan Brubaker
              openshift-trt OpenShift Technical Release Team
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: