-
Bug
-
Resolution: Unresolved
-
Critical
-
4.18.z
-
None
-
OCP Node Sprint 268 (Green)
-
1
-
False
-
-
Description of problem:
Guaranteed pods fails to start on worker nodes where Performance profile is applied
Version-Release number of selected component (if applicable):
4.18.2
How reproducible:
Everytime
Steps to Reproduce:
1. Apply the below performance profile apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: cpu: isolated: 1-39,41-79 reserved: 0,40 machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/worker: "" nodeSelector: node-role.kubernetes.io/worker: "" numa: topologyPolicy: single-numa-node
2. Make the node cgroupv1
apiVersion: config.openshift.io/v1 kind: Node metadata: name: cluster spec: cgroupMode: "v1"
3. Create a guaranteed pod as show below:
apiVersion: v1 kind: Pod metadata: name: pod1 # annotations: # cpu-load-balancing.crio.io: "disable" spec: containers: - name: test-container1 image: registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14 command: - sleep - inf resources: limits: memory: "100Mi" cpu: "2" nodeSelector: kubernetes.io/hostname: worker-0
Actual results:
[root@registry ~]# oc get pods NAME READY STATUS RESTARTS AGE pod1 0/1 CreateContainerError 0 24s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 47s default-scheduler Successfully assigned default/pod1 to worker-0 Normal AddedInterface 48s multus Add eth0 [10.131.0.21/23] from ovn-kubernetes Normal Pulling 48s kubelet Pulling image "registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14" Normal Pulled 26s kubelet Successfully pulled image "registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14" in 21.544s (21.544s including waiting). Image size: 1244232582 bytes. Warning Failed 2s (x4 over 26s) kubelet Error: container create failed: write file `cpuset.cpus`: Device or resource busy Normal Pulled 2s (x3 over 26s) kubelet Container image "registry.hlxcl12.lab.eng.tlv2.redhat.com:5000/cnf-tests:4.14" already present on machine
Expected results:
pod should run successfully
Additional info: