Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36258

kubelet does not start after reboot due to dependency issue

XMLWordPrintable

    • No
    • 255 - Integration & Delivery
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, for clusters upgraded from earlier versions of {product-title}, enabling `kdump` on an OVN-enabled cluster sometimes prevented the node from rejoining the cluster or returning to the `Ready` state. With this release, stale data from earlier {product-title} versions are removed, so that nodes can now correctly start and rejoin the cluster. (link:https://issues.redhat.com/browse/OCPBUGS-36258[*OCPBUGS-36258*])
      Show
      * Previously, for clusters upgraded from earlier versions of {product-title}, enabling `kdump` on an OVN-enabled cluster sometimes prevented the node from rejoining the cluster or returning to the `Ready` state. With this release, stale data from earlier {product-title} versions are removed, so that nodes can now correctly start and rejoin the cluster. (link: https://issues.redhat.com/browse/OCPBUGS-36258 [* OCPBUGS-36258 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-36198. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-33694. The following is the description of the original issue:

      Description of problem:

      kubelet does not start after reboot due to dependency issue

      Version-Release number of selected component (if applicable):

       OCP 4.14.23
        

      How reproducible:

          Every time at customer end

      Steps to Reproduce:

          1. Upgrade Openshift cluster (OVN based) with kdump enabled to OCP 4.14.23
          2. Check kubelet and crio status 
          
          

      Actual results:

          kubelet and crio services are in dead state and do not start automatically after reboot, manual intervention is needed.
      
      $ cat sos_commands/crio/systemctl_status_crio 
      ○ crio.service - Container Runtime Interface for OCI (CRI-O)
           Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled)
          Drop-In: /etc/systemd/system/crio.service.d
                   └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
           Active: inactive (dead)
             Docs: https://github.com/cri-o/cri-o$ cat sos_commands/openshift/systemctl_status_kubelet 
      ○ kubelet.service - Kubernetes Kubelet
           Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
          Drop-In: /etc/systemd/system/kubelet.service.d
                   └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
           Active: inactive (dead)
      

      Expected results:

          kubelet and crio should start automatically.

      Additional info:

      I feel the recent patch to wait till kdump starts has broken the ordering cycle.
      
      https://github.com/openshift/machine-config-operator/pull/4213/files
      
      May 09 19:12:05 network01 systemd[1]: network-online.target: Found dependency on kdump.service/start
      May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Found ordering cycle on kdump.service/start
      May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Job kdump.service/start deleted to break ordering cycle starting with ovs-configuration.service/start
      May 12 21:20:57 network01 systemd[1]: node-valid-hostname.service: Found dependency on kdump.service/start
      May 12 21:21:00 network01 kdumpctl[1389]: kdump: kexec: loaded kdump kernel
      May 12 21:21:00 network01 kdumpctl[1389]: kdump: Starting kdump: [OK]
      May 12 21:25:28 network01 systemd[1]: kdump.service: Found ordering cycle on network-online.target/start
      May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on node-valid-hostname.service/start
      May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on ovs-configuration.service/start
      May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on kdump.service/start
      May 12 21:25:28 network01 systemd[1]: kdump.service: Job network-online.target/start deleted to break ordering cycle starting with kdump.service/start
      May 12 21:25:31 network01 kdumpctl[1284]: kdump: kexec: loaded kdump kernel
      May 12 21:25:31 network01 kdumpctl[1284]: kdump: Starting kdump: [OK]
      
      To break a cycle, systemd deletes a job part of the cycle, making the corresponding service not to be started.
        Disabling kdump and rebooting the node helps, kubelet and crio start automatically. 
      
      # systemctl disable kdump
      
      # systemctl reboot
      
      Make sure systemctl list-jobs do not have any pending jobs, once it is completed, we can check status of kubelet.
      
      # systemctl list-jobs
      
      # systemctl status kubelet

              team-mco Team MCO
              openshift-crt-jira-prow OpenShift Prow Bot
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: