-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.20
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Pods managed by Kueue that specify an explicit runAsUser in their securityContext become permanently stuck in SchedulingGated status. While the corresponding Workload object is successfully created and reaches the Admitted state (with quota reserved), the kueue.x-k8s.io/admission scheduling gate is never removed from the Pod spec.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Create Kueue resources (ResourceFlavor, ClusterQueue and LocalQueue) - "resources.yaml"
2. Create a pod without runAsUser - "pod1.yaml"
3. Create a pod with runAsUser - "pod2.yaml"
4. Check if pods are running
Actual results:
Pod created without runAsUser is Running and the second one, created with runAsUser is SchedulingGated. ❯ oc get pods --namespace test-bug NAME READY STATUS RESTARTS AGE pod-busybox-no-uid 1/1 Running 1 (114s ago) 3m55s pod-busybox-uid-1000 0/1 SchedulingGated 0 3m3s
Expected results:
Both pods should be running
Additional info:
When a UID is provided, OpenShift shifts the Pod from restricted-v2 to nonroot-v2.
# No UID Pod -> restricted-v2 ❯ oc get pod pod-busybox-no-uid -o jsonpath='{.metadata.annotations.openshift\.io/scc}' restricted-v2 # UID 1000 Pod -> nonroot-v2 ❯ oc get pod pod-busybox-uid-1000 -o jsonpath='{.metadata.annotations.openshift\.io/scc}' nonroot-v2
Gate Status:
❯ oc get pod pod-busybox-no-uid -o jsonpath='{.spec.schedulingGates}' --namespace test-bug ❯ oc get pod pod-busybox-uid-1000 -o jsonpath='{.spec.schedulingGates}' --namespace test-bug [{"name":"kueue.x-k8s.io/admission"},{"name":"kueue.x-k8s.io/topology"}]
Ps: to discover runAsUser valid range:
❯ oc get namespace namespace_name -o jsonpath='{.metadata.annotations.openshift\.io/sa\.scc\.uid-range}'
1000790000/10000%
- relates to
-
OCPKUEUE-511 [Release] - Upgrade testing
-
- Closed
-