-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
13
-
False
-
-
False
-
-
-
GitOps Tangerine - Sprint 3258, GitOps Tangerine - Sprint 3259, GitOps Tangerine - Sprint 3261, GitOps Tangerine - Sprint 3262
Story (Required)
Created automated tests based on kube-burner that can create many namespaces and Argo Applications deploying resources in those destination. The goal of the test, is to capture some of the metrics that are critical to the ArgoCD performance - CPU, Memory, Threads and IO.
Background (Required)
In an OpenShift based setup (or a similar k8s setup), which has a huge number of CRDs (~200), not all CRDs are to be managed by ArgoCD. Current implementation of the cluster cache creates a watch for each resource type per namespace, causing too many watch connections opened to the API server. This causes client side throttling as we can see in the below error message.
{{}}
I0117 11:37:10.038643 1 request.go:601] Waited for 1.001246788s due to client-side throttling, not priority and fairness, request: GET:[https://172.30.0.1:443/api/v1/namespaces/test-ns-011/](https://172.30.0.1/api/v1/namespaces/test-ns-011/secrets?limit=500)...
When we tested with ~100 namespaces, it was observed that there were too many watches created and the requests were throttled. This issue could be partially solved by setting resources.inclusion and resources.exclusion fields. But since these are static, users have to know in advance what exact resource types would have to be managed by ArgoCD.
Out of scope
None
Approach (Required)
- Go through the cache implementation in gitops-engine
- To overcome the problem of too many watches created, and to overcome the static nature of resource.inclusions / resource.exclusions settings, it is preferrable to have ArgoCD determine which resource types are being managed by Argo applications and create watches only for those specific types. This will reduce the number of network connections opened to the API server and also reduce the cache memory usage of the application controller.
- The changes should be done in the ClusterCache code in the gitops-engine code base. Have 2 set of API resources. One that is available in the cluster and another set of resources that are managed via a the Argo application. Create the watches only for those resource types that are managed by any Argo application.
Dependencies
Related issues upstream
https://github.com/argoproj/argo-cd/issues/6561
https://github.com/argoproj/argo-cd/issues/17236
Acceptance Criteria (Mandatory)
- Automated tests using kube-burner should be created. These tests should:
- Create a load testing setup for cluster scoped instance.
- Create a large number of namespaces and Argo applications.
- Deploy resources in these namespaces.
- Capture and report metrics such as CPU, memory, threads, and IO usage.
- Capture the time delay for application sync.
- The kube-burner configuration and the findings of the test should be in the gitops repository.
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met
- clones
-
GITOPS-4114 Dynamic resource filtering for caching only those resources that are managed by ArgoCD
-
- Closed
-
- is cloned by
-
GITOPS-5482 Enhance Kube Burner scripts to wait for ArgoCD application status
-
- Closed
-