Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41968

After minor upgrade from 4.14.34 to 4.14.35, Kubelet and CRIO services not starting after reboot of the RHOCP master node. It needs to start manually.

XMLWordPrintable

    • Critical
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      IHAC who recently performed a minor upgrade from 4.14.34 to 4.14.35 across their OpenShift cluster, and while the upgrade was successful on the worker nodes, the following issue was observed with the master nodes.
      
      # Issue: Kubelet and CRI-O services are not starting automatically after a reboot of the master nodes. These services need to be started manually, but the expected behavior is for them to start automatically.
      
      The systemctl status output is as follows:
      ~~~
      ○ kubelet.service - Kubernetes Kubelet
           Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
          Drop-In: /etc/systemd/system/kubelet.service.d
                   └─01-kubens.conf, 10-mco-default-madv.conf, 10-mco-on-prem-wait-resolv.conf, 20-logging.conf, 20-nodenet.conf
           Active: inactive (dead)
      ----
      ○ crio.service - Container Runtime Interface for OCI (CRI-O)
           Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; preset: disabled)
          Drop-In: /etc/systemd/system/crio.service.d
                   └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
           Active: inactive (dead)
             Docs: https://github.com/cri-o/cri-oWe have checked that the below services are in an inactive state:

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Upgrade cluster from 4.14.34 to 4.14.35

      Steps to Reproduce:

      Upgrade cluster from 4.14.34 to 4.14.35

      Actual results:

      Master nodes should rollout with new upgrade and join the cluster, without manually starting the kubelet and crio. 

      Expected results:

          Kubelet and crio services should start automatically after node reboot. 

      Additional info:

      Case id : 03916061
      SOS report : https://drive.google.com/file/d/14wG1QTCrg5A27XNhGY4eOIwFl08fiFq6/view?usp=sharing
      
      
      Kubelet and crio services will not come up, unless it's dependent services are not up.
      
      Example nodeip-configuration service has not been triggered yet, same goes for ovs service.
      There are other services which are not triggered.
      
      nodeip-configuration.service                                                                                               loaded    inactive dead      Writes IP address configuration so that kubelet and crio services select a valid node IP
      
      ovs-configuration.service                                                                                                  loaded    inactive dead      Configures OVS with proper host networking configuration
      
      We have captured sosreport of the affected node with systemd.log_level=debug, where we can see the services are not called.
      The jobs are showing below errors, please help understand if there is any issue at systemd layer? As from OCP end there is no issues.
      The service is enabled. 
      
      From 0100-sosreport-wce-ocp-bm-02-Case03916061-2024-09-09-xegtrrw.tar.xz: 
      
      wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com
      
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: ovs-configuration.service: starting held back, waiting for: nodeip-configuration.service
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Child 3801 belongs to nodeip-configuration.service.
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Control process exited, code=exited, status=0/SUCCESS (success)
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Got final SIGCHLD for state start-post.
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Deactivated successfully.
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Service will not restart (restart setting)
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Changed start-post -> dead
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Job 426 nodeip-configuration.service/start finished, result=done
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1624 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1625 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1624 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1625 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Consumed 455ms CPU time.
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1627 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1628 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1627 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: nodeip-configuration.service: Control group is empty.
      Sep 09 20:52:22 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd-logind[3533]: Got message type=signal sender=:1.1 destination=n/a path=/org/freedesktop/systemd1/unit/nodeip_2dconfiguration_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=1628 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
      Sep 09 21:41:39 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Found unit nodeip-configuration.service at /etc/systemd/system/nodeip-configuration.service (regular file)
      Sep 09 21:41:42 wce-ocp-bm-02.wce-5g.wirelesstech.charterlab.com systemd[1]: Found unit nodeip-configuration.service at /etc/systemd/system/nodeip-configuration.service (regular file)
      
      Starting the kubelet service manually works.
      
      
      
      
      

              harpatil@redhat.com Harshal Patil
              rhn-support-shpawar SHUBHAM PAWAR
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: