Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
4.10.z
-
Rejected
-
False
-
Description
Description of problem:
kube-apiserver revision rollout gets stuck The etcd encryption is enabled in the node which is causing may new iterations of the secret encryption-config in openshift-kube-apiserver namespace which in turn is triggering apiserver revision rollout. The revision rollout gets stuck with below error. ~~~ 2023-01-06T15:48:01.746113417Z E0106 15:48:01.746011 1 base_controller.go:272] MissingStaticPodController reconciliation failed: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 206 on node: "<nodename>" didn't show up, waited: 4m15s ~~~ The below message is noticed in the kube-apiserver-guard pod which is failing to start with connection refused message. ~~~ WARNING: kubelet did not terminate old kube-apiserver before new one.\"\n\n # We failed to acquire exclusive lock, which means there is old kube-apiserver running in system.\n # Since we utilize SO_REUSEPORT, we need to make sure the old kube-apiserver stopped listening.\n #\n # NOTE: This is a fallback for broken kubelet, if you observe this please report a bug.\n echo -n \"Waiting for port 6443 to be released due to likely bug in kubelet or CRI-O ~~~ Error seen in kubelet logs: ~~~ Dec 30 22:26:45 <nodename> hyperkube[1642]: I1230 22:26:45.640186 1642 prober.go:121] "Probe failed" probeType="Readiness" pod="openshift-kube-apiserver/kube-apiserver-guard-<nodename>" podUID=7ca0d0b0-b69f-4 98a-a38e-a0e035baaaf1 containerName="guard" probeResult=failure output="Get \"https://10.162.19.137:6443/healthz\": dial tcp 10.162.19.137:6443: connect: connection refused" ~~~ After deleting the kube-apiserver pod t wasn't coming up and we got below error. ~~~ GuardController reconcilliation failed: Missing operand on node <nodename> ~~~ Finally, the pod came up after restarting the kubelet service on the node. The revision rollout for kube-apiserver progressed after that.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Enable etcd encryption 2. Observe kube-apiserver CO status and revision rollouts
Actual results:
kube-apiserver revision rollout gets stuck
Expected results:
kube-apiserver revision rollout to complete without issues with nodes
Additional info:
Must gather and sosreport from the affected node during the issue - https://drive.google.com/drive/folders/1pzYX5P0untglQWTahtNeNRFP4_ARChsU?usp=sharing