-
Bug
-
Resolution: Unresolved
-
Normal
-
4.18, 4.19, 4.20
-
None
Description of problem:
In a recent hackathon with Amadeus, we found that scaling of Nodes on GCP (0-400) was bottlenecked by sequential processing of reconcile requests in the Machine API provider for GCP. Adding the ability to configure and then scale the nodes using parallel execution of 10 reconciles at once, significantly improved the performance.
Version-Release number of selected component (if applicable):
4.20 and below
How reproducible:
100%
Steps to Reproduce:
1. Create an OCP cluster on GCP 2. Scale several machinesets to a total of around 400 nodes 3. Observe machines take approximately 20 minutes to join the cluster Below steps shared by Zhaohua Sun 1.set up a cluster with flexy-install, you can rebuild this job, just update INSTANCE_NAME_PREFIX and LAUNCHER_VARS , add below to LAUNCHER_VARS vm_type_masters: 'n2-standard-16' vm_type_workers: 'n2-standard-2' 2. create infra nodes, you can rebuild this job , update BUILD_NUMBER with your flexy job id 3. once the above are down, scale up machineset, as said in bug, I scale to 400 nodes by 3 times. the first time oc scale machineset zhsungcp-djlkm-worker-b --replicas 50 oc scale machineset zhsungcp-djlkm-worker-c --replicas 50 oc scale machineset zhsungcp-djlkm-worker-d --replicas 50 the second time $ oc scale machineset zhsungcp-djlkm-worker-b --replicas 100 oc scale machineset zhsungcp-djlkm-worker-c --replicas 100 oc scale machineset zhsungcp-djlkm-worker-d --replicas 100 the third time oc scale machineset zhsungcp-djlkm-worker-b --replicas 130 oc scale machineset zhsungcp-djlkm-worker-c --replicas 130 oc scale machineset zhsungcp-djlkm-worker-d --replicas 140
Actual results:
Nodes take a significant time to join the cluster
Expected results:
Nodes should join the cluster quickly
Additional info:
- is cloned by
-
OCPBUGS-59386 [release-4.19] GCP scaling is slow when scaling large volumes of nodes
-
- Closed
-
- is depended on by
-
OCPBUGS-59386 [release-4.19] GCP scaling is slow when scaling large volumes of nodes
-
- Closed
-
- links to