Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32265

Intermittent Installation Failures Due to Non-Starting kube-controller-manager Pod

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During several attempts to install OCP 4.13 on OSP 16.2, we observed intermittent installation failures across different network configurations. The specific issue causing these failures is the kube-controller-manager pod not starting, which leads to a degraded cluster state. This problem has been noted under various installation types.

      Version-Release number of selected component (if applicable):

      OCP :4.13.40
      on top of
      OSP :RHOS-16.2-RHEL-8-20240229.n.1

      How reproducible:

      The issue occurs intermittently, affecting some installations but not all

      Steps to Reproduce:

      1.Begin installation of OCP 4.13 on a cluster running OpenStack 16.2         2.Monitor the installation process.
      
          

      Actual results:

          The kube-controller-manager fails to start and results in a degraded state with the following error message: 
      2024-04-13 05:25:57.530 | level=error msg=Cluster operator kube-controller-manager Degraded is True with MissingStaticPodController_SyncError::StaticPods_Error: MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 10 on node: "ostest-svz99-master-2" didn't show up, waited: 3m0s
      ```"

      Expected results:

          The kube-controller-manager pod should start successfully without errors. Installation should be completed 

      Additional info:

      Must gather attached in the comment
      
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      kube-controller-manager                    4.13.40   True        True          True       53m     MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 10 on node: "ostest-svz99-master-2" didn't show up, waited: 3m0s...   

            maandre@redhat.com Martin André
            ykhodork Yaakov Khodorkovski
            Itshak Brown Itshak Brown
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: