Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: openshift-4.12, openshift-4.13, openshift-4.14, openshift-4.15, openshift-4.16, openshift-4.17
Component/s: MCO, Node
Labels:
- cee.neXT
- kubeletconfig
- mco
- node

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Intelligence Requested:
Market:
PX Impact Score:
PX Priority Data:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Steps to reproduce:

1. Create a kubeletconfig CR for configuring garbage collection for containers and images and make a typo:

# vim kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: worker-kubeconfig 
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: "" 
  kubeletConfig:
    evictionSoft: 
      memory.available: "500Mi" 
      nodesfs.available: "10%"   -----> Made a typo here
      nodefs.inodesFree: "5%"
      imagefs.available: "15%"
      imagefs.inodesFree: "10%"
    evictionSoftGracePeriod:  
      memory.available: "1m30s"
      nodefs.available: "1m30s"
      nodefs.inodesFree: "1m30s"
      imagefs.available: "1m30s"
      imagefs.inodesFree: "1m30s"
    evictionHard: 
      memory.available: "200Mi"
      nodefs.available: "5%"
      nodefs.inodesFree: "4%"
      imagefs.available: "10%"
      imagefs.inodesFree: "5%"
    evictionPressureTransitionPeriod: 0s 
    imageMinimumGCAge: 5m 
    imageGCHighThresholdPercent: 80 
    imageGCLowThresholdPercent: 75

2. Apply the kubeletconfig CR:

# oc apply -f kubeletconfig.yaml

3. Check the MCP progress and node state:

$ oc get mcp worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-3aa629bea780edf9b271a37fb54dc00f   False     True       False      2              0                   0                     0                      19h
$ oc get nodes worker01
worker01   NotReady,SchedulingDisabled   worker                 19h   v1.27.6+1648878

4. Check the kubelet logs:

Oct 04 13:36:36 worker01 kubenswrapper[8431]: E1004 13:36:36.418493    8431 run.go:74] "command failed" err="failed to run Kubelet: failed to create kubelet: unsupported eviction signal nodesfs.available"

The objective of this RFE:

There should be a dry-run performed before actually pushing the changes at the node level to ensure that the kubeletconfig CR changes will not bring the kubelet to a Dead state.

Impact:

Any typo in the kubeleconfig CR can bring the node into a NotReady state if the configuration parameter is incorrect. This would have a severe impact on the clusters running 100+ worker nodes where the MachineConfigPool applies the changes onto the nodes in a batch of 5/10 nodes at a time.

Assignee:: Mark Russell

Reporter:: Divyam Pateriya

Need Info From:: Mark Russell

Votes:: 10 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/10/04 2:42 PM

Updated:: 2024/10/21 5:42 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates