Epic Goal
Introduce sharding a algorithm for ArgoCD Application controllers that shards the controller pods based on number of applications, as opposed to existing mechanisms that are based on number of target clusters.
Why is this important?
Currently ArgoCD Application controllers shards the load (Application syncs) only based on the target cluster thus limiting the scalability in scenarios where there are a lot of applications targeting a single cluster: Basically you can only scale vertically but not horizontally.
Ideally, you should be able to have 1000 apps spreaded unequally across 10 Clusters, and have all the Applications controllers equally managing the apps.
Example:
- Customer has Clusters A and Cluster B, where Cluster A hosts 1000 Applications and Cluster B hosts 10 Applications
- ArgoCD Application controllers are scaled to 2 replicas
- One replica handles Cluster A, hence is reconciling 1000 Applications - While the other replica handles Cluster B, which only reconciles 10 Applications
- Currently, to avoid OOM Kills, you will have to tune RAM and CPU requests and limits to accomodate for 1000 Applications for both replicas,
- This implies wasting resources as the Cluster B application controller doesn't need that much resources, as it's only reconciling 10 Applications
The goal for this RFE would be to enable ArgoCD to equally spread the load on the Application controllers shards, so that each shard handles 505 Applications each (500 from Cluster A, and 5 from Cluster B). This would allow to a much easier scaling (Both Horizontal and Vertical) while efficiently using the available resources.
Scenarios
TBD
Acceptance Criteria (Mandatory)
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- A sharding algorithm exists that shards application controllers based on number of applications assigned to each shard
Dependencies (internal and external)
TBD
Previous Work (Optional):
TBD
Open questions::
TBD
Done Checklist
- Acceptance criteria are met
- Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
- User Journey automation is delivered
- Support and SRE teams are provided with enough skills to support the feature in production environment
- is related to
-
RFE-5356 [GitOps] application-controllers sharding based on Applications
- Accepted