Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 4.14
Affects Version/s: 4.10.z
Component/s: Node / Kubelet
Labels:

Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Links:

Description

Description of problem:

kube-apiserver revision rollout gets stuck

The etcd encryption is enabled in the node which is causing may new iterations of the secret encryption-config in openshift-kube-apiserver namespace which in turn is triggering apiserver revision rollout.

The revision rollout gets stuck with below error.
~~~
2023-01-06T15:48:01.746113417Z E0106 15:48:01.746011       1 base_controller.go:272] MissingStaticPodController reconciliation failed: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 206 on node: "<nodename>" didn't show up, waited: 4m15s
~~~
The below message is noticed in the kube-apiserver-guard pod which is failing to start with connection refused message.
~~~
WARNING: kubelet did not terminate old kube-apiserver before new one.\"\n\n  # We failed to acquire exclusive lock, which means there is old kube-apiserver running in system.\n  # Since we utilize SO_REUSEPORT, we need to make sure the old kube-apiserver stopped listening.\n  #\n  # NOTE: This is a fallback for broken kubelet, if you observe this please report a bug.\n  echo -n \"Waiting for port 6443 to be released due to likely bug in kubelet or CRI-O
~~~
Error seen in kubelet logs:
~~~
Dec 30 22:26:45 <nodename> hyperkube[1642]: I1230 22:26:45.640186    1642 prober.go:121] "Probe failed" probeType="Readiness" pod="openshift-kube-apiserver/kube-apiserver-guard-<nodename>" podUID=7ca0d0b0-b69f-4
98a-a38e-a0e035baaaf1 containerName="guard" probeResult=failure output="Get \"https://10.162.19.137:6443/healthz\": dial tcp 10.162.19.137:6443: connect: connection refused"
~~~
After deleting the kube-apiserver pod t wasn't coming up and we got below error.
~~~
GuardController reconcilliation failed: Missing operand on node <nodename>
~~~
Finally, the pod came up after restarting the kubelet service on the node. The revision rollout for kube-apiserver progressed after that.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Enable etcd encryption
2. Observe kube-apiserver CO status and revision rollouts

Actual results:

kube-apiserver revision rollout gets stuck

Expected results:

kube-apiserver revision rollout to complete without issues with nodes

Additional info:

Must gather and sosreport from the affected node during the issue - https://drive.google.com/drive/folders/1pzYX5P0untglQWTahtNeNRFP4_ARChsU?usp=sharing

Attachments

Activity

People

Assignee:: Ryan Phillips

Reporter:: Alok Singh

QA Contact:: Sunil Choudhary

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2023/01/06 7:59 PM

Updated:: 2023/05/08 3:29 PM

Resolved:: 2023/05/08 3:29 PM