Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: Node Tuning Operator
Labels:
- telco-priority-4

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

The Node Tuning Operator Updated the Kubeletconfig of the cluster. This resulted to the cluster to start a Machine Config Rollout without the user to do any action. In more detail:

The node tuning operator updates the kubelet-config:
~~~
2023-07-18T21:23:27.048335958+03:00 I0718 18:23:27.048303       1 resources.go:159] Update kubelet-config "performance-worker"
~~~

From kubeletconfig CR.
~~~
    manager: cluster-node-tuning-operator
     operation: Update
     time: "2023-07-18T18:23:27Z"
~~~
A new "99-worker-generated-kubelet" MC is getting created with name "99-worker-generated-kubelet-1" from the Machine Config Controller.

A new rendered MC that includes the new kubeletconfig MC is getting created from the controller.
~~~
2023-07-18T21:26:24.722890720+03:00 I0718 18:26:24.722788       1 render_controller.go:510] Generated machineconfig rendered-worker-0d708d32a1e04ec8b02986c40d10828b from 16 configs: 
[{MachineConfig  00-worker  
machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  

machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  
machineconfiguration.openshift.io/v1  } {MachineConfig  50-nto-worker  
machineconfiguration.openshift.io/v1  } {MachineConfig  50-performance-worker  
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-chrony-conf-override  
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-containerruntime  
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-kubelet  
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-kubelet-1  <---------------------------- This one
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-registries  
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-legacy-kdump-configuration  
machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  
machineconfiguration.openshift.io/v1  } {MachineConfig  coredump-mc-worker  
machineconfiguration.openshift.io/v1  } {MachineConfig  softpanic-worker  
machineconfiguration.openshift.io/v1  } {MachineConfig  worker-custom-timezone-configuration  
machineconfiguration.openshift.io/v1  } {MachineConfig  worker-std-r750-1-combo-sctp  
machineconfiguration.openshift.io/v1  }]
2023-07-18T21:26:24.723116641+03:00 I0718 18:26:24.723021       1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"worker", UID:"e104aa2e-0729-49ae-8022-502cbb414ab1", APIVersion:"machineconfiguration.
openshift.io/v1", ResourceVersion:"87008874", FieldPath:""}): type: 'Normal' reason: 'RenderedConfigGenerated' rendered-worker-0d708d32a1e04ec8b02986c40d10828b successfully generated (release version: 4.12.8, controller version: 731341b8
9e72d53abb349aff98d09e281e471066)
~~~
This is getting applied to the cluster.

The differences between the previous and the new rendered MC are the below:
~~~
[nstamate@fedora ~]$ diff rendered-worker-0d708d32a1e04ec8b02986c40d10828b-kubelet-config.yaml rendered-worker-1102262599b5c40ca2c2aae79f1b35d0-kubelet-config.yaml 
48c48,54
<   "cpuManagerReconcilePeriod": "0s",
---
>   "cpuManagerPolicy": "static",
>   "cpuManagerPolicyOptions": {
>     "full-pcpus-only": "true"
>   },
>   "cpuManagerReconcilePeriod": "5s",
>   "memoryManagerPolicy": "Static",
>   "topologyManagerPolicy": "single-numa-node",
54a61,66
>   "evictionHard": {
>     "imagefs.available": "15%",
>     "memory.available": "100Mi",
>     "nodefs.available": "10%",
>     "nodefs.inodesFree": "5%"
>   },
65,68c77,80
<   "allowedUnsafeSysctls": [
<     "fs.mqueue.*",
<     "net.*"
<   ],
---
>   "kubeReserved": {
>     "memory": "500Mi"
>   },
>   "reservedSystemCPUs": "0-3,52-55",
79c91,99
<   "shutdownGracePeriodCriticalPods": "0s"
---
>   "shutdownGracePeriodCriticalPods": "0s",
>   "reservedMemory": [
>     {
>       "numaNode": 0,
>       "limits": {
>         "memory": "1100Mi"
>       }
>     }
>   ]
~~~
Then the changes are reverted automatically.
~~~
2023-07-19T10:39:09.804948915+03:00 I0719 07:39:09.804900       1 resources.go:159] Update kubelet-config "performance-worker"
~~~
And the Machine Config Controller targets the previous rendered MC.

A new rollout starts.

The admin didn't touch anything.

Must-gather and other manifests are in the attached case.

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Cannot reproduce as it happened only once.

Steps to Reproduce:

1.
2.
3.

Actual results:

The Kubelet config is changed by the Node Tuning Operator at random times

Expected results:

The Kubelet config should not be changed by the Node Tuning Operator at random times

Additional info:

Assignee:: Martin Sivak

Reporter:: Nikolaos Stamatelopoulos

QA Contact:: Mallapadi Niranjan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/07/25 7:59 PM

Updated:: 2023/09/14 12:21 PM

Resolved:: 2023/09/14 12:21 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates