-
Bug
-
Resolution: Unresolved
-
Critical
-
4.18.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
OCP Node Sprint 268 (Green)
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Guaranteed pods fails to start on worker nodes where Performance profile is applied
Version-Release number of selected component (if applicable):
4.18.2
How reproducible:
Everytime
Steps to Reproduce:
1. Apply the below performance profile
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
spec:
cpu:
isolated: 1-39,41-79
reserved: 0,40
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/worker: ""
nodeSelector:
node-role.kubernetes.io/worker: ""
numa:
topologyPolicy: single-numa-node
2. Make the node cgroupv1
apiVersion: config.openshift.io/v1 kind: Node metadata: name: cluster spec: cgroupMode: "v1"
3. Create a guaranteed pod as show below:
apiVersion: v1
kind: Pod
metadata:
name: pod1
# annotations:
# cpu-load-balancing.crio.io: "disable"
spec:
containers:
- name: test-container1
image: registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14
command:
- sleep
- inf
resources:
limits:
memory: "100Mi"
cpu: "2"
nodeSelector:
kubernetes.io/hostname: worker-0
Actual results:
[root@registry ~]# oc get pods
NAME READY STATUS RESTARTS AGE
pod1 0/1 CreateContainerError 0 24s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 47s default-scheduler Successfully assigned default/pod1 to worker-0
Normal AddedInterface 48s multus Add eth0 [10.131.0.21/23] from ovn-kubernetes
Normal Pulling 48s kubelet Pulling image "registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14"
Normal Pulled 26s kubelet Successfully pulled image "registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14" in 21.544s (21.544s including waiting). Image size: 1244232582 bytes.
Warning Failed 2s (x4 over 26s) kubelet Error: container create failed: write file `cpuset.cpus`: Device or resource busy
Normal Pulled 2s (x3 over 26s) kubelet Container image "registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14" already present on machine
Expected results:
pod should run successfully
Additional info: