Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37696

Kubelet Fails to Auto-Start After Applying Manual Node Sizing in KubeletConfig

XMLWordPrintable

    • Important
    • None
    • OCP Node Sprint 259 (Blue)
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Manually applying systemreserved values to nodes using kubeletconfig results in kubelet failing to start automatically. 
      
      After a node reboot, kubelet is stuck in the "activating" state and requires manual intervention (SSH into the node and restart kubelet) to complete the update.
      
      When the MachineConfigPool (MCP) updates nodes, nodes may fail to start properly after reboot. The expected behavior is for nodes to come online without manual intervention. 
      
      Currently, nodes require a manual restart of kubelet to apply updates successfully, causing difficulties in applying the recommended systemreserved values. 
      
      ~~~
      Jul 15 15:48:19 XXX systemd[1]: Stopping Kubernetes Kubelet... 
      Jul 15 15:48:19 XXX kubenswrapper[2016]: I0715 15:48:19.848814    2016 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/kubelet-ca.crt" 
      Jul 15 15:48:19 XXX systemd[1]: kubelet.service: Succeeded. 
      Jul 15 15:48:19 XXX systemd[1]: Stopped Kubernetes Kubelet. 
      Jul 15 15:48:19 XXX systemd[1]: kubelet.service: Consumed 6h 16min 33.196s CPU time
      
      -- Reboot --
      
      Jul 15 15:49:31 XXX systemd[1]: Dependency failed for Kubernetes Kubelet. Jul 15 15:49:31 XXX systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'. 
      Jul 15 16:06:04 XXX systemd[1]: Starting Kubernetes Kubelet... 
      Jul 15 16:06:05 XXX kubenswrapper[2551]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote' Jul 15 16:06:05 XXX kubenswrapper[2551]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version. 
      ~~~

      Version-Release number of selected component (if applicable):

          4.12.37

      How reproducible:

      I couldn't reproduce the issue on the lab cluster, and everything worked fine on my cluster.

      Steps to Reproduce:

      Create a kubeletconfig for Manually allocating resources for nodes
      
      https://docs.openshift.com/container-platform/4.12/nodes/nodes/nodes-nodes-resources-configuring.html#nodes-nodes-resources-configuring-setting_nodes-nodes-resources-configuring
      
      ~~~
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: set-allocatable 
      spec:
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/worker: "" 
        kubeletConfig:
          systemReserved: 
            cpu: 1000m
            memory: 1Gi
      ~~~
          
      After applying the kubeletconfig, the /etc/node-sizing.env file was successfully updated on the worker nodes with desired value.

      Actual results:

      Need to manually start the kubelet on the node via SSH.

      Expected results:

      Kubelet Should Automatically Be Up and Running Without Manual Intervention

      Additional info:

          

              svanka@redhat.com Sai Ramesh Vanka
              rhn-support-arbhagat Arpit Bhagat
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: