-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
4.12.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The Node Tuning Operator Updated the Kubeletconfig of the cluster. This resulted to the cluster to start a Machine Config Rollout without the user to do any action. In more detail:
The node tuning operator updates the kubelet-config:
~~~
2023-07-18T21:23:27.048335958+03:00 I0718 18:23:27.048303 1 resources.go:159] Update kubelet-config "performance-worker"
~~~
From kubeletconfig CR.
~~~
manager: cluster-node-tuning-operator
operation: Update
time: "2023-07-18T18:23:27Z"
~~~
A new "99-worker-generated-kubelet" MC is getting created with name "99-worker-generated-kubelet-1" from the Machine Config Controller.
A new rendered MC that includes the new kubeletconfig MC is getting created from the controller.
~~~
2023-07-18T21:26:24.722890720+03:00 I0718 18:26:24.722788 1 render_controller.go:510] Generated machineconfig rendered-worker-0d708d32a1e04ec8b02986c40d10828b from 16 configs:
[{MachineConfig 00-worker
machineconfiguration.openshift.io/v1 } {MachineConfig 01-worker-container-runtime
machineconfiguration.openshift.io/v1 } {MachineConfig 01-worker-kubelet
machineconfiguration.openshift.io/v1 } {MachineConfig 50-nto-worker
machineconfiguration.openshift.io/v1 } {MachineConfig 50-performance-worker
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-chrony-conf-override
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-generated-containerruntime
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-generated-kubelet
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-generated-kubelet-1 <---------------------------- This one
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-generated-registries
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-legacy-kdump-configuration
machineconfiguration.openshift.io/v1 } {MachineConfig 99-worker-ssh
machineconfiguration.openshift.io/v1 } {MachineConfig coredump-mc-worker
machineconfiguration.openshift.io/v1 } {MachineConfig softpanic-worker
machineconfiguration.openshift.io/v1 } {MachineConfig worker-custom-timezone-configuration
machineconfiguration.openshift.io/v1 } {MachineConfig worker-std-r750-1-combo-sctp
machineconfiguration.openshift.io/v1 }]
2023-07-18T21:26:24.723116641+03:00 I0718 18:26:24.723021 1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"worker", UID:"e104aa2e-0729-49ae-8022-502cbb414ab1", APIVersion:"machineconfiguration.
openshift.io/v1", ResourceVersion:"87008874", FieldPath:""}): type: 'Normal' reason: 'RenderedConfigGenerated' rendered-worker-0d708d32a1e04ec8b02986c40d10828b successfully generated (release version: 4.12.8, controller version: 731341b8
9e72d53abb349aff98d09e281e471066)
~~~
This is getting applied to the cluster.
The differences between the previous and the new rendered MC are the below:
~~~
[nstamate@fedora ~]$ diff rendered-worker-0d708d32a1e04ec8b02986c40d10828b-kubelet-config.yaml rendered-worker-1102262599b5c40ca2c2aae79f1b35d0-kubelet-config.yaml
48c48,54
< "cpuManagerReconcilePeriod": "0s",
---
> "cpuManagerPolicy": "static",
> "cpuManagerPolicyOptions": {
> "full-pcpus-only": "true"
> },
> "cpuManagerReconcilePeriod": "5s",
> "memoryManagerPolicy": "Static",
> "topologyManagerPolicy": "single-numa-node",
54a61,66
> "evictionHard": {
> "imagefs.available": "15%",
> "memory.available": "100Mi",
> "nodefs.available": "10%",
> "nodefs.inodesFree": "5%"
> },
65,68c77,80
< "allowedUnsafeSysctls": [
< "fs.mqueue.*",
< "net.*"
< ],
---
> "kubeReserved": {
> "memory": "500Mi"
> },
> "reservedSystemCPUs": "0-3,52-55",
79c91,99
< "shutdownGracePeriodCriticalPods": "0s"
---
> "shutdownGracePeriodCriticalPods": "0s",
> "reservedMemory": [
> {
> "numaNode": 0,
> "limits": {
> "memory": "1100Mi"
> }
> }
> ]
~~~
Then the changes are reverted automatically.
~~~
2023-07-19T10:39:09.804948915+03:00 I0719 07:39:09.804900 1 resources.go:159] Update kubelet-config "performance-worker"
~~~
And the Machine Config Controller targets the previous rendered MC.
A new rollout starts.
The admin didn't touch anything.
Must-gather and other manifests are in the attached case.
Version-Release number of selected component (if applicable):
N/A
How reproducible:
Cannot reproduce as it happened only once.
Steps to Reproduce:
1. 2. 3.
Actual results:
The Kubelet config is changed by the Node Tuning Operator at random times
Expected results:
The Kubelet config should not be changed by the Node Tuning Operator at random times
Additional info: