-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.13
-
Moderate
-
No
-
Rejected
-
False
-
Description of problem:
pod with GPU requested, a volume assigned and nodeSelector set to "topology.kubernetes.io/zone" is failing to trigger OpenShift Container Platform 4 - Node scale-up: I0511 12:31:41.594376 1 request.go:1171] Response Body: {"kind":"Scale","apiVersion":"autoscaling/v1","metadata":{"name":"foo-mbr2h-gpu-us-east-2b","namespace":"openshift-machine-api","uid":"c75f1686-b15f-4fa5-bee0-a78ea711a3d5","resourceVersion":"4502637","creationTimestamp":"2023-05-11T11:23:56Z"},"spec":{},"status":{"replicas":0}} I0511 12:31:41.594532 1 clusterapi_provider.go:67] discovered node group: MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2b (min: 0, max: 2, replicas: 0) I0511 12:31:41.594691 1 binder.go:724] "PVC is not bound" PVC="project-300/gpu" I0511 12:31:41.594799 1 scale_up.go:93] Pod gpu-848f7d47d9-9rzqz can't be scheduled on MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2b, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0511 12:31:41.594826 1 scale_up.go:262] No pod can fit to MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2b I0511 12:31:41.594883 1 binder.go:724] "PVC is not bound" PVC="project-300/gpu" I0511 12:31:41.594920 1 scale_up.go:93] Pod gpu-848f7d47d9-9rzqz can't be scheduled on MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2a, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0511 12:31:41.594939 1 scale_up.go:262] No pod can fit to MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2a I0511 12:31:41.595005 1 binder.go:724] "PVC is not bound" PVC="project-300/gpu" I0511 12:31:41.595062 1 scale_up.go:93] Pod gpu-848f7d47d9-9rzqz can't be scheduled on MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0511 12:31:41.595093 1 scale_up.go:262] No pod can fit to MachineSet/openshift-machine-api/foo-mbr2h-gpu-us-east-2c I0511 12:31:41.595157 1 binder.go:724] "PVC is not bound" PVC="project-300/gpu" I0511 12:31:41.595203 1 scale_up.go:93] Pod gpu-848f7d47d9-9rzqz can't be scheduled on MachineSet/openshift-machine-api/foo-mbr2h-ossm-us-east-2a, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0511 12:31:41.595224 1 scale_up.go:262] No pod can fit to MachineSet/openshift-machine-api/foo-mbr2h-ossm-us-east-2a I0511 12:31:41.595299 1 binder.go:724] "PVC is not bound" PVC="project-300/gpu" I0511 12:31:41.595378 1 scale_up.go:93] Pod gpu-848f7d47d9-9rzqz can't be scheduled on MachineSet/openshift-machine-api/foo-mbr2h-ossm-us-east-2b, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0511 12:31:41.595419 1 scale_up.go:262] No pod can fit to MachineSet/openshift-machine-api/foo-mbr2h-ossm-us-east-2b I0511 12:31:41.595440 1 scale_up.go:267] No expansion options When removing the nodeSelector the OpenShift Container Platform 4 - Node scale-up is triggered as expected.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.10, 4.11, 4.12 and 4.13
How reproducible:
Always
Steps to Reproduce:
1. Install OpenShift Container Platform 4 2. Create MachineSet with MachineAutoscaler in 3 availability zones with GPU instanceType 3. Create deployment referencing a PVC that is not Bound, with nvidia.com/gpu: "1" set in request and nodeSelector with topology.kubernetes.io/zone set to a available zone.
Actual results:
0s Normal NotTriggerScaleUp pod/gpu-5bcb679b75-6vc9v pod didn't trigger scale-up: 5 node(s) didn't match Pod's node affinity/selector 0s Normal NotTriggerScaleUp pod/gpu-5bcb679b75-6vc9v pod didn't trigger scale-up: 5 node(s) didn't match Pod's node affinity/selector
Expected results:
Scale-up of OpenShift Container Platform 4 - Node with available GPU to be triggered and pod eventually scheduled on the newly added OpenShift Container Platform 4 - Node
Additional info:
A similar problem was reported in https://bugzilla.redhat.com/show_bug.cgi?id=1891551 but solved. Hence not sure to what extend this is related but with OpenShift Container Platform 4.13 as latest test version this fix should be available and therefore not trigger any issues.
- is related to
-
OCPBUGS-6979 The OpenShift autoscaler does not trigger a scale-up for a MachineAutoscaler with "minReplicas: 0" for Pods that define ephemeral-storage requests.
- Closed