-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14.z
This is a clone of issue OCPBUGS-33694. The following is the description of the original issue:
—
Description of problem:
kubelet does not start after reboot due to dependency issue
Version-Release number of selected component (if applicable):
OCP 4.14.23
How reproducible:
Every time at customer end
Steps to Reproduce:
1. Upgrade Openshift cluster (OVN based) with kdump enabled to OCP 4.14.23 2. Check kubelet and crio status
Actual results:
kubelet and crio services are in dead state and do not start automatically after reboot, manual intervention is needed. $ cat sos_commands/crio/systemctl_status_crio ○ crio.service - Container Runtime Interface for OCI (CRI-O) Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf Active: inactive (dead) Docs: https://github.com/cri-o/cri-o$ cat sos_commands/openshift/systemctl_status_kubelet ○ kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf Active: inactive (dead)
Expected results:
kubelet and crio should start automatically.
Additional info:
I feel the recent patch to wait till kdump starts has broken the ordering cycle. https://github.com/openshift/machine-config-operator/pull/4213/files May 09 19:12:05 network01 systemd[1]: network-online.target: Found dependency on kdump.service/start May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Found ordering cycle on kdump.service/start May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Job kdump.service/start deleted to break ordering cycle starting with ovs-configuration.service/start May 12 21:20:57 network01 systemd[1]: node-valid-hostname.service: Found dependency on kdump.service/start May 12 21:21:00 network01 kdumpctl[1389]: kdump: kexec: loaded kdump kernel May 12 21:21:00 network01 kdumpctl[1389]: kdump: Starting kdump: [OK] May 12 21:25:28 network01 systemd[1]: kdump.service: Found ordering cycle on network-online.target/start May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on node-valid-hostname.service/start May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on ovs-configuration.service/start May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on kdump.service/start May 12 21:25:28 network01 systemd[1]: kdump.service: Job network-online.target/start deleted to break ordering cycle starting with kdump.service/start May 12 21:25:31 network01 kdumpctl[1284]: kdump: kexec: loaded kdump kernel May 12 21:25:31 network01 kdumpctl[1284]: kdump: Starting kdump: [OK] To break a cycle, systemd deletes a job part of the cycle, making the corresponding service not to be started. Disabling kdump and rebooting the node helps, kubelet and crio start automatically. # systemctl disable kdump # systemctl reboot Make sure systemctl list-jobs do not have any pending jobs, once it is completed, we can check status of kubelet. # systemctl list-jobs # systemctl status kubelet
- blocks
-
OCPBUGS-36258 kubelet does not start after reboot due to dependency issue
- Closed
- is cloned by
-
OCPBUGS-36258 kubelet does not start after reboot due to dependency issue
- Closed
- links to
-
RHBA-2024:4316 OpenShift Container Platform 4.16.z bug fix update