Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: Node / Kubelet
Labels:
- blue
- kubelet
- kubeletconfig
- triaged

Severity:
Important
Regression:
None
Sprint:
OCP Node Sprint 259 (Blue)
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

Manually applying systemreserved values to nodes using kubeletconfig results in kubelet failing to start automatically. 

After a node reboot, kubelet is stuck in the "activating" state and requires manual intervention (SSH into the node and restart kubelet) to complete the update.

When the MachineConfigPool (MCP) updates nodes, nodes may fail to start properly after reboot. The expected behavior is for nodes to come online without manual intervention. 

Currently, nodes require a manual restart of kubelet to apply updates successfully, causing difficulties in applying the recommended systemreserved values. 

~~~
Jul 15 15:48:19 XXX systemd[1]: Stopping Kubernetes Kubelet... 
Jul 15 15:48:19 XXX kubenswrapper[2016]: I0715 15:48:19.848814    2016 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/kubelet-ca.crt" 
Jul 15 15:48:19 XXX systemd[1]: kubelet.service: Succeeded. 
Jul 15 15:48:19 XXX systemd[1]: Stopped Kubernetes Kubelet. 
Jul 15 15:48:19 XXX systemd[1]: kubelet.service: Consumed 6h 16min 33.196s CPU time

-- Reboot --

Jul 15 15:49:31 XXX systemd[1]: Dependency failed for Kubernetes Kubelet. Jul 15 15:49:31 XXX systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'. 
Jul 15 16:06:04 XXX systemd[1]: Starting Kubernetes Kubelet... 
Jul 15 16:06:05 XXX kubenswrapper[2551]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote' Jul 15 16:06:05 XXX kubenswrapper[2551]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version. 
~~~

Version-Release number of selected component (if applicable):

    4.12.37

How reproducible:

I couldn't reproduce the issue on the lab cluster, and everything worked fine on my cluster.

Steps to Reproduce:

Create a kubeletconfig for Manually allocating resources for nodes

https://docs.openshift.com/container-platform/4.12/nodes/nodes/nodes-nodes-resources-configuring.html#nodes-nodes-resources-configuring-setting_nodes-nodes-resources-configuring

~~~
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-allocatable 
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: "" 
  kubeletConfig:
    systemReserved: 
      cpu: 1000m
      memory: 1Gi
~~~
    
After applying the kubeletconfig, the /etc/node-sizing.env file was successfully updated on the worker nodes with desired value.

Actual results:

Need to manually start the kubelet on the node via SSH.

Expected results:

Kubelet Should Automatically Be Up and Running Without Manual Intervention

Additional info:

Assignee:: Sai Ramesh Vanka

Reporter:: Arpit Bhagat

QA Contact:: Sunil Choudhary

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/07/30 8:11 AM

Updated:: 2024/09/19 12:46 PM

Resolved:: 2024/09/19 12:46 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates