Steps to reproduce:
1. Create a kubeletconfig CR for configuring garbage collection for containers and images and make a typo:
# vim kubeletconfig.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: worker-kubeconfig spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: "" kubeletConfig: evictionSoft: memory.available: "500Mi" nodesfs.available: "10%" -----> Made a typo here nodefs.inodesFree: "5%" imagefs.available: "15%" imagefs.inodesFree: "10%" evictionSoftGracePeriod: memory.available: "1m30s" nodefs.available: "1m30s" nodefs.inodesFree: "1m30s" imagefs.available: "1m30s" imagefs.inodesFree: "1m30s" evictionHard: memory.available: "200Mi" nodefs.available: "5%" nodefs.inodesFree: "4%" imagefs.available: "10%" imagefs.inodesFree: "5%" evictionPressureTransitionPeriod: 0s imageMinimumGCAge: 5m imageGCHighThresholdPercent: 80 imageGCLowThresholdPercent: 75
2. Apply the kubeletconfig CR:
# oc apply -f kubeletconfig.yaml
3. Check the MCP progress and node state:
$ oc get mcp worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-3aa629bea780edf9b271a37fb54dc00f False True False 2 0 0 0 19h $ oc get nodes worker01 worker01 NotReady,SchedulingDisabled worker 19h v1.27.6+1648878
4. Check the kubelet logs:
Oct 04 13:36:36 worker01 kubenswrapper[8431]: E1004 13:36:36.418493 8431 run.go:74] "command failed" err="failed to run Kubelet: failed to create kubelet: unsupported eviction signal nodesfs.available"
The objective of this RFE:
There should be a dry-run performed before actually pushing the changes at the node level to ensure that the kubeletconfig CR changes will not bring the kubelet to a Dead state.
Impact:
Any typo in the kubeleconfig CR can bring the node into a NotReady state if the configuration parameter is incorrect. This would have a severe impact on the clusters running 100+ worker nodes where the MachineConfigPool applies the changes onto the nodes in a batch of 5/10 nodes at a time.