Details
-
Bug
-
Resolution: Obsolete
-
Undefined
-
None
-
4.16.0
-
Critical
-
No
-
Hypershift Sprint 252, Hypershift Sprint 253
-
2
-
Proposed
-
False
-
Description
Description of problem:
When using cluster size tagging, and a number of placeholder pods is configured to keep nodes warm, in some cases pods of 2 deployments can be scheduled on different zones but the same subnet pair (1 of each) so that the deployments never finish rolling out.
Version-Release number of selected component (if applicable):
4.16.0 - main
How reproducible:
Sometimes
Steps to Reproduce:
1. Setup a management cluster with request-serving machinesets 2. Configure clustersizingconfiguration to have at least 2 pairs of warm nodes for the smallest size.
Actual results:
For a given pair of deployments, 1 pod of each deployment is scheduled, with the other left in pending.
Expected results:
For a given pair of deployments, all pods are scheduled.
Additional info:
Example state: $ oc get pods -n hypershift-request-serving-node-placeholders NAME READY STATUS RESTARTS AGE placeholder-small-0-7b696687d8-cnf6r 0/1 Pending 0 4h59m placeholder-small-0-7b696687d8-qb2bz 1/1 Running 0 4h59m placeholder-small-1-575fc487df-8rgxd 0/1 Pending 0 4h59m placeholder-small-1-575fc487df-9bt7n 1/1 Running 0 4h59m Pods from small-0 and small-1 are scheduled on nodes with the same osd-fleet-manager.openshift.io/paired-nodes label, resulting in pending pods that will never schedule.