-
Bug
-
Resolution: Won't Do
-
Minor
-
None
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Im trying to test node crash scenarios in latest 4.19 nighty build cluster. I induce the crash by running echo c > /proc/sysrq-trigger from the node. After the crash is completed we have seen the node stuck in NotReady state. On further analysis we found out that kubelet service logs it complains about /var/lib/kubelet/pki/kubelet-client-current.pem certificate and the reason was, that this file was empty. I also find out, that there were some Pending CSRs: csr-66nc8 81s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-ldvdv 68s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending Once I've approved those CSRs, removed the empty file /var/lib/kubelet/pki/kubelet-client-current.pem and restart kubelet service, the nodes get's ready. I have been doing similar testing almost every build of 4.19 but we are only seeing this issue in the recent builds (< 1.5 weeks)
Version-Release number of selected component (if applicable):
OCP: 4.19.0-0.nightly-2025-07-02-143253
How reproducible:
Near 100%
Steps to Reproduce:
1. Deploy latest 4.19 OCP cluster 2. Crash some of the nodes using echo c > /proc/sysrq-trigger 3. Should be hitting the above mentioned issue on node recovery
Actual results:
Node stuck in NotReady because of certificate issue
Expected results:
Node should recover successfully
Additional info: