Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5475

kube-apiserver revision rollout gets stuck and needs kubelet to be restarted

    XMLWordPrintable

Details

    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      kube-apiserver revision rollout gets stuck
      
      The etcd encryption is enabled in the node which is causing may new iterations of the secret encryption-config in openshift-kube-apiserver namespace which in turn is triggering apiserver revision rollout.
      
      The revision rollout gets stuck with below error.
      ~~~
      2023-01-06T15:48:01.746113417Z E0106 15:48:01.746011       1 base_controller.go:272] MissingStaticPodController reconciliation failed: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 206 on node: "<nodename>" didn't show up, waited: 4m15s
      ~~~
      The below message is noticed in the kube-apiserver-guard pod which is failing to start with connection refused message.
      ~~~
      WARNING: kubelet did not terminate old kube-apiserver before new one.\"\n\n  # We failed to acquire exclusive lock, which means there is old kube-apiserver running in system.\n  # Since we utilize SO_REUSEPORT, we need to make sure the old kube-apiserver stopped listening.\n  #\n  # NOTE: This is a fallback for broken kubelet, if you observe this please report a bug.\n  echo -n \"Waiting for port 6443 to be released due to likely bug in kubelet or CRI-O
      ~~~
      Error seen in kubelet logs:
      ~~~
      Dec 30 22:26:45 <nodename> hyperkube[1642]: I1230 22:26:45.640186    1642 prober.go:121] "Probe failed" probeType="Readiness" pod="openshift-kube-apiserver/kube-apiserver-guard-<nodename>" podUID=7ca0d0b0-b69f-4
      98a-a38e-a0e035baaaf1 containerName="guard" probeResult=failure output="Get \"https://10.162.19.137:6443/healthz\": dial tcp 10.162.19.137:6443: connect: connection refused"
      ~~~
      After deleting the kube-apiserver pod t wasn't coming up and we got below error.
      ~~~
      GuardController reconcilliation failed: Missing operand on node <nodename>
      ~~~
      Finally, the pod came up after restarting the kubelet service on the node. The revision rollout for kube-apiserver progressed after that.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1. Enable etcd encryption
      2. Observe kube-apiserver CO status and revision rollouts
      

      Actual results:

      kube-apiserver revision rollout gets stuck

      Expected results:

      kube-apiserver revision rollout to complete without issues with nodes

      Additional info:

      Must gather and sosreport from the affected node during the issue - https://drive.google.com/drive/folders/1pzYX5P0untglQWTahtNeNRFP4_ARChsU?usp=sharing

      Attachments

        Activity

          People

            rphillip@redhat.com Ryan Phillips
            rhn-support-alosingh Alok Singh
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: