Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36843

[Need backport to 4.13]: Static pods are not able to start on one node (fixed in 4.14)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Can't Do
    • Icon: Major Major
    • None
    • 4.13.z
    • Node / Kubelet
    • None
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      
      Customers are encountering the following bug in 4.13.z installs (so far), this caused install failures for a supported version. 
      
      The the fix in 4.14 needs to be backported to 4.13 to avoid users from encountering this bug.
      
      https://issues.redhat.com/browse/OCPBUGS-2474
      
          

      Version-Release number of selected component (if applicable):

      
          

      How reproducible:

          Intermittent, unsure about the frequency. Please have a look at the linked/closed ocpbugs ticket for more information as QA was done in there.
          

      Steps to Reproduce:

          1. Install a cluster with version < 4.14
          2.
          3.
          

      Actual results:

          Almost the same with what is describe in the following KCS that is also pointing to the linked/closed ocpbugs ticket/bugzilla.
      https://access.redhat.com/solutions/6976343 . Static pods failed to start namely kube-apiserver and guard pods. 
      
          kube-apiserver static pods failed to start and gets stuck. 
          

      Expected results:

          kube-apiserver should have a running state
          

      Additional info:

      
      During install, ARO checks for cluster operators health and especially kube-apiserver. The following log we get when this install failure happens in many clusters. Please see the KCS/bugzilla, the errors are almost exactly the same;
      
                          "reason": "GuardController_SyncError::MissingStaticPodController_SyncError::StaticPods_Error",
                          "message": "GuardControllerDegraded: Missing operand on node <redacted>\nMissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: \"kube-apiserver\" in namespace: \"openshift-kube-apiserver\" for revision: 6 on node: \"<redacted>\" didn't show up, waited: 4m45s\nStaticPodsDegraded: pod/kube-apiserver<redacted> \"kube-apiserver\" is terminated: Completed: \nStaticPodsDegraded: <redacted> \"kube-apiserver-cert-regeneration-controller\" is terminated: Completed: 
      
      
          

            aos-node@redhat.com Node Team Bot Account
            jcueto@redhat.com Jose Cueto
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: