Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16769

Node Tuning Operator is updating the kubelet config without any admin action

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The Node Tuning Operator Updated the Kubeletconfig of the cluster. This resulted to the cluster to start a Machine Config Rollout without the user to do any action. In more detail:
      
      The node tuning operator updates the kubelet-config:
      ~~~
      2023-07-18T21:23:27.048335958+03:00 I0718 18:23:27.048303       1 resources.go:159] Update kubelet-config "performance-worker"
      ~~~
      
      From kubeletconfig CR.
      ~~~
          manager: cluster-node-tuning-operator
           operation: Update
           time: "2023-07-18T18:23:27Z"
      ~~~
      A new "99-worker-generated-kubelet" MC is getting created with name "99-worker-generated-kubelet-1" from the Machine Config Controller.
      
      A new rendered MC that includes the new kubeletconfig MC is getting created from the controller.
      ~~~
      2023-07-18T21:26:24.722890720+03:00 I0718 18:26:24.722788       1 render_controller.go:510] Generated machineconfig rendered-worker-0d708d32a1e04ec8b02986c40d10828b from 16 configs: 
      [{MachineConfig  00-worker  
      machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  
      
      machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  
      machineconfiguration.openshift.io/v1  } {MachineConfig  50-nto-worker  
      machineconfiguration.openshift.io/v1  } {MachineConfig  50-performance-worker  
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-chrony-conf-override  
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-containerruntime  
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-kubelet  
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-kubelet-1  <---------------------------- This one
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-registries  
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-legacy-kdump-configuration  
      machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  
      machineconfiguration.openshift.io/v1  } {MachineConfig  coredump-mc-worker  
      machineconfiguration.openshift.io/v1  } {MachineConfig  softpanic-worker  
      machineconfiguration.openshift.io/v1  } {MachineConfig  worker-custom-timezone-configuration  
      machineconfiguration.openshift.io/v1  } {MachineConfig  worker-std-r750-1-combo-sctp  
      machineconfiguration.openshift.io/v1  }]
      2023-07-18T21:26:24.723116641+03:00 I0718 18:26:24.723021       1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"worker", UID:"e104aa2e-0729-49ae-8022-502cbb414ab1", APIVersion:"machineconfiguration.
      openshift.io/v1", ResourceVersion:"87008874", FieldPath:""}): type: 'Normal' reason: 'RenderedConfigGenerated' rendered-worker-0d708d32a1e04ec8b02986c40d10828b successfully generated (release version: 4.12.8, controller version: 731341b8
      9e72d53abb349aff98d09e281e471066)
      ~~~
      This is getting applied to the cluster.
      
      The differences between the previous and the new rendered MC are the below:
      ~~~
      [nstamate@fedora ~]$ diff rendered-worker-0d708d32a1e04ec8b02986c40d10828b-kubelet-config.yaml rendered-worker-1102262599b5c40ca2c2aae79f1b35d0-kubelet-config.yaml 
      48c48,54
      <   "cpuManagerReconcilePeriod": "0s",
      ---
      >   "cpuManagerPolicy": "static",
      >   "cpuManagerPolicyOptions": {
      >     "full-pcpus-only": "true"
      >   },
      >   "cpuManagerReconcilePeriod": "5s",
      >   "memoryManagerPolicy": "Static",
      >   "topologyManagerPolicy": "single-numa-node",
      54a61,66
      >   "evictionHard": {
      >     "imagefs.available": "15%",
      >     "memory.available": "100Mi",
      >     "nodefs.available": "10%",
      >     "nodefs.inodesFree": "5%"
      >   },
      65,68c77,80
      <   "allowedUnsafeSysctls": [
      <     "fs.mqueue.*",
      <     "net.*"
      <   ],
      ---
      >   "kubeReserved": {
      >     "memory": "500Mi"
      >   },
      >   "reservedSystemCPUs": "0-3,52-55",
      79c91,99
      <   "shutdownGracePeriodCriticalPods": "0s"
      ---
      >   "shutdownGracePeriodCriticalPods": "0s",
      >   "reservedMemory": [
      >     {
      >       "numaNode": 0,
      >       "limits": {
      >         "memory": "1100Mi"
      >       }
      >     }
      >   ]
      ~~~
      Then the changes are reverted automatically.
      ~~~
      2023-07-19T10:39:09.804948915+03:00 I0719 07:39:09.804900       1 resources.go:159] Update kubelet-config "performance-worker"
      ~~~
      And the Machine Config Controller targets the previous rendered MC.
      
      A new rollout starts.
      
      The admin didn't touch anything.
      
      Must-gather and other manifests are in the attached case.

      Version-Release number of selected component (if applicable):

      N/A

      How reproducible:

      Cannot reproduce as it happened only once.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      The Kubelet config is changed by the Node Tuning Operator at random times

      Expected results:

      The Kubelet config should not be changed by the Node Tuning Operator at random times

      Additional info:

       

              msivak@redhat.com Martin Sivak
              rhn-support-nstamate Nikolaos Stamatelopoulos
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: