Description of problem:
IHAC who recently performed a minor upgrade from 4.14.34 to 4.14.35 across their OpenShift cluster, and while the upgrade was successful on the worker nodes, the following issue was observed with the master nodes. # Issue: Kubelet and CRI-O services are not starting automatically after a reboot of the master nodes. These services need to be started manually, but the expected behavior is for them to start automatically. The systemctl status output is as follows: ~~~ ○ kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─01-kubens.conf, 10-mco-default-madv.conf, 10-mco-on-prem-wait-resolv.conf, 20-logging.conf, 20-nodenet.conf Active: inactive (dead) ---- ○ crio.service - Container Runtime Interface for OCI (CRI-O) Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf Active: inactive (dead) Docs: https://github.com/cri-o/cri-oWe have checked that the below services are in an inactive state:
Version-Release number of selected component (if applicable):
How reproducible:
Upgrade cluster from 4.14.34 to 4.14.35
Steps to Reproduce:
Upgrade cluster from 4.14.34 to 4.14.35
Actual results:
Master nodes should rollout with new upgrade and join the cluster, without manually starting the kubelet and crio.
Expected results:
Kubelet and crio services should start automatically after node reboot.
Additional info:
Case id : 03916061 SOS report : https://drive.google.com/file/d/14wG1QTCrg5A27XNhGY4eOIwFl08fiFq6/view?usp=sharing Kubelet and crio services will not come up, unless it's dependent services are not up. Example nodeip-configuration service has not been triggered yet, same goes for ovs service. There are other services which are not triggered. nodeip-configuration.service loaded inactive dead Writes IP address configuration so that kubelet and crio services select a valid node IP ovs-configuration.service loaded inactive dead Configures OVS with proper host networking configuration We have captured sosreport of the affected node with systemd.log_level=debug, where we can see the services are not called. The jobs are showing below errors, please help understand if there is any issue at systemd layer? As from OCP end there is no issues. The service is enabled. From 0100-sosreport-wce-ocp-bm-02-Case03916061-2024-09-09-xegtrrw.tar.xz: wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: ovs-configuration.service: starting held back, waiting for: nodeip-configuration.service Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Child 3801 belongs to nodeip-configuration.service. Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Control process exited, code=exited, status=0/SUCCESS (success) Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Got final SIGCHLD for state start-post. Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Deactivated successfully. Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Service will not restart (restart setting) Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Changed start-post -> dead Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Job 426 nodeip-configuration.service/start finished, result=done Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1624 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1625 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1624 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1625 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Consumed 455ms CPU time. Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1627 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1628 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1627 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Control group is empty. Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1628 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Sep 09 21:41:39 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Found unit nodeip-configuration.service at /etc/systemd/system/nodeip-configuration.service (regular file) Sep 09 21:41:42 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Found unit nodeip-configuration.service at /etc/systemd/system/nodeip-configuration.service (regular file) Starting the kubelet service manually works.