-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.15
-
Low
-
No
-
False
-
Description of problem:
If kubelet systemd service is restarted beyond start-limit-hit ( #DefaultStartLimitIntervalSec=10s ), OpenShift node is stuck in NotReady state as the kubelet service is stopped after that. This impacts all the VMI's running on the node: [root@cc37-h25-000-r750 ~]# oc get vmis --all-namespaces | grep cc37-h33-000-r750 benchmark-runner windows-vm-a2bb6137-0 7d10h Running 10.130.1.31 cc37-h33-000-r750 False benchmark-runner windows-vm-a2bb6137-100 7d10h Running 10.130.1.14 cc37-h33-000-r750 False Systemd settings on the RHCOS node: #DefaultRestartSec=100ms #DefaultStartLimitIntervalSec=10s Kubelet service logs: systemctl status kubelet× kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-on-prem-wait-resolv.conf, 20-logging.conf, 20-nodenet.conf Active: failed (Result: start-limit-hit) since Tue 2024-08-27 01:22:44 UTC; 4min 54s ago Duration: 1.028s Process: 862624 ExecCondition=/bin/bash -c test -f /run/resolv-prepender-kni-conf-done || exit 255 (code=exited, status=0/SUCCESS) Process: 862625 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Process: 862628 ExecStartPre=/usr/sbin/restorecon /usr/local/bin/kubenswrapper /usr/bin/kubensenter (code=exited, status=0/SUCCESS) Process: 862630 ExecStart=/usr/local/bin/kubenswrapper /usr/bin/kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=> Main PID: 862630 (code=exited, status=0/SUCCESS) CPU: 2.414s Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: Failed to start Kubernetes Kubelet.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: kubelet.service: Start request repeated too quickly.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: kubelet.service: Failed with result 'start-limit-hit'.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: Failed to start Kubernetes Kubelet.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: kubelet.service: Start request repeated too quickly.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: kubelet.service: Failed with result 'start-limit-hit'.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: Failed to start Kubernetes Kubelet.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: kubelet.service: Start request repeated too quickly.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: kubelet.service: Failed with result 'start-limit-hit'.Aug 27 01:22:45 cc37-h35-000-r750 systemd[1]: Failed to start Kubernetes Kubelet.
Version-Release number of selected component (if applicable):
4.15
How reproducible:
Always
Steps to Reproduce:
1. Install OpenShift 4.15 cluster on baremetal 2. Restart Kubelet on one of the worker node multiple times within 10 seconds duration 3. Observe the status of the kubelet and node relevant node
Actual results:
Kubelet fails to start leading to node in NotReady state
Expected results:
Kubelet service is running and node is Ready to run workloads
Additional info:
Must-gather, journal logs: https://drive.google.com/drive/folders/1A73Uh0nFyPk9raBCmt4fFgGjwjFxqAYW?usp=sharing