-
Spike
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
5
-
False
-
None
-
False
-
SECFLOWOTL-172 - Improve performance and reduce footprint of OpenShift GitOps
-
-
-
GITOPS Core Sprint 3253, GITOPS Core Sprint 3254, GitOps Tangerine - Sprint 3264
Story (Required)
resource.inclusions / resource.exclusions are static mechanisms to control the resource kinds that are being watched and stored in the cluster cache.Explore how to implemet dynamic watch that watches only those resources that are managed by any of the argo application, instead of watching all resources.
Background (Required)
In an OpenShift based setup (or a similar k8s setup), which has a huge number of CRDs (~200), not all CRDs are to be managed by ArgoCD. Current implementation of the cluster cache creates a watch for each resource type per namespace, causing too many watch connections opened to the API server. This causes client side throttling as we can see in the below error message.
{{}}
I0117 11:37:10.038643 1 request.go:601] Waited for 1.001246788s due to client-side throttling, not priority and fairness, request: GET:[https://172.30.0.1:443/api/v1/namespaces/test-ns-011/](https://172.30.0.1/api/v1/namespaces/test-ns-011/secrets?limit=500)...
When we tested with ~100 namespaces, it was observed that there were too many watches created and the requests were throttled. This issue could be partially solved by setting resources.inclusion and resources.exclusion fields. But since these are static, users have to know in advance what exact resource types would have to be managed by ArgoCD.
Out of scope
None
Approach (Required)
- Go through the cache implementation in gitops-engine
- To overcome the problem of too many watches created, and to overcome the static nature of resource.inclusions / resource.exclusions settings, it is preferrable to have ArgoCD determine which resource types are being managed by Argo applications and create watches only for those specific types. This will reduce the number of network connections opened to the API server and also reduce the cache memory usage of the application controller.
- The changes should be done in the ClusterCache code in the gitops-engine code base. Have 2 set of API resources. One that is available in the cluster and another set of resources that are managed via a the Argo application. Create the watches only for those resource types that are managed by any Argo application.
Dependencies
Related issues upstream
https://github.com/argoproj/argo-cd/issues/6561
https://github.com/argoproj/argo-cd/issues/17236
Acceptance Criteria (Mandatory)
- Conduct thorough research on the current resource inclusion/exclusion mechanisms in ArgoCD and how they can be adapted for dynamic resource filtering.
- Explore how to construct a hierarchical resource tree to enable dynamic creation of watches for child resources. For instance, if an ArgoCD application deploys a Deployment, the system should automatically establish watches for related child resources such as Pods and ReplicaSets.
- Evaluate the impact of dynamic resource filtering on ArgoCD's performance.
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met
- clones
-
GITOPS-3713 Analyze resource usage in a typical Cluster scoped ArgoCD instance based setup
- Closed
- is cloned by
-
GITOPS-4644 Performance Tests for baselining OpenShift GitOps performance
- Closed