-
Bug
-
Resolution: Done
-
Major
-
None
-
4.14.z, 4.15
-
No
-
Rejected
-
False
-
-
Description of problem:
On a fresh cluster deployment, applying performanceprofile or kubeletconfig for the first time (no reserved memory-related changes were done previously) with systemReserved changes causes ALL cluster nodes to reboot, event nodes this change is unrelated to. Important: - nodes for which the change is not relevant for, were just rebooted, and the change was not applied for them - such all nodes reboot only once on the clean deployment; on the following changes for the system, reserved memory/CPU by performanceprofile or kubeleteconfig were rebooted relevant nodes only
Version-Release number of selected component (if applicable):
Client Version: 4.15.0-ec.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.15.0-ec.3 Kubernetes Version: v1.28.3+20a5764 the same issue observed on 4.14 as well: Client Version: 4.14.6 Kustomize Version: v5.0.1 Server Version: 4.14.6 Kubernetes Version: v1.27.8+4fab27b
How reproducible:
always
Steps to Reproduce:
1. deploy cluster (even the minimal cluster will be suitable to reproduce this issue: 3 masters + 2 workers 2. apply kubeletconfig related to the reserved memory change. For example: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-sysreserved-master resourceVersion: "774812" uid: 3a575d56-d5ac-4e12-bbdf-5f7c328ed705 spec: kubeletConfig: systemReserved: cpu: 500m memory: 27Gi machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: ""
Actual results:
kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. But simultaneously, a rolling reboot of the worker nodes was started.
Expected results:
kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. And no workers reboot should be observed
Additional info:
performanceprofile example: apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance-samsung-cnf spec: additionalKernelArgs: - mmio_stale_data=off - mds=off - tsx_async_abort=off - retbleed=off cpu: isolated: 2-19,22-39,42-59,62-79 reserved: 0-1,20-21,40-41,60-61 globallyDisableIrqLoadBalancing: false hugepages: defaultHugepagesSize: 2M pages: - count: 32768 node: 0 size: 2M - count: 32768 node: 1 size: 2M machineConfigPoolSelector: machineconfiguration.openshift.io/role: samsung-cnf nodeSelector: node-role.kubernetes.io/samsung-cnf: "" numa: topologyPolicy: single-numa-node realTimeKernel: enabled: false workloadHints: highPowerConsumption: false realTime: false