-
Bug
-
Resolution: Done
-
Major
-
None
-
4.14.z, 4.15
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
On a fresh cluster deployment, applying performanceprofile or kubeletconfig for the first time (no reserved memory-related changes were done previously) with systemReserved changes causes ALL cluster nodes to reboot, event nodes this change is unrelated to. Important: - nodes for which the change is not relevant for, were just rebooted, and the change was not applied for them - such all nodes reboot only once on the clean deployment; on the following changes for the system, reserved memory/CPU by performanceprofile or kubeleteconfig were rebooted relevant nodes only
Version-Release number of selected component (if applicable):
Client Version: 4.15.0-ec.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.15.0-ec.3 Kubernetes Version: v1.28.3+20a5764 the same issue observed on 4.14 as well: Client Version: 4.14.6 Kustomize Version: v5.0.1 Server Version: 4.14.6 Kubernetes Version: v1.27.8+4fab27b
How reproducible:
always
Steps to Reproduce:
1. deploy cluster (even the minimal cluster will be suitable to reproduce this issue: 3 masters + 2 workers
2. apply kubeletconfig related to the reserved memory change. For example:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-sysreserved-master
resourceVersion: "774812"
uid: 3a575d56-d5ac-4e12-bbdf-5f7c328ed705
spec:
kubeletConfig:
systemReserved:
cpu: 500m
memory: 27Gi
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/master: ""
Actual results:
kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. But simultaneously, a rolling reboot of the worker nodes was started.
Expected results:
kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. And no workers reboot should be observed
Additional info:
performanceprofile example:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance-samsung-cnf
spec:
additionalKernelArgs:
- mmio_stale_data=off
- mds=off
- tsx_async_abort=off
- retbleed=off
cpu:
isolated: 2-19,22-39,42-59,62-79
reserved: 0-1,20-21,40-41,60-61
globallyDisableIrqLoadBalancing: false
hugepages:
defaultHugepagesSize: 2M
pages:
- count: 32768
node: 0
size: 2M
- count: 32768
node: 1
size: 2M
machineConfigPoolSelector:
machineconfiguration.openshift.io/role: samsung-cnf
nodeSelector:
node-role.kubernetes.io/samsung-cnf: ""
numa:
topologyPolicy: single-numa-node
realTimeKernel:
enabled: false
workloadHints:
highPowerConsumption: false
realTime: false