-
Bug
-
Resolution: Done
-
Major
-
1.13.0, 1.12.0, 1.10.0, 1.14.0
-
8
-
False
-
None
-
False
-
-
-
-
GitOps Tangerine - Sprint 3260, GitOps Tangerine - Sprint 3265, GitOps Tangerine - Sprint 3266
Severity: Sev 3
Customer case #: 03632032
OCP Version: 4.12.30
GitOps Operator Version: v1.10.0
What is the impact of your issue: Unneccessarily requested resources by unused application-controller pods on multiple ArgoCD instances due to lack of dynamic scaling
Slack thread: https://redhat-internal.slack.com/archives/CMP95ST2N/p1697474720786429
Description of problem:
Client is using the new dynamic scaling feature for application controller of OpenShift GitOps 1.10. Client experiences extremely high resource usage on the application-controller pods when using the dynamic scaling, as compared to the previous static scaling and legacy sharding. Client has an argocd instance which manages 368 applications spread over 8 clusters. Following controller configuration is set when using dynamic scaling, causing high resource usage, which aligns with upstream configuration options:
spec:
controller:
logFormat: json
logLevel: warn
processors:
operation: 100
status: 100
sharding:
dynamicScalingEnabled: true
minShards: 1
maxShards: 10
clustersPerShard: 1
Following are the details of the pods resource consumption post-enabling dynamic scaling:
$ oc adm top pod
NAME CPU(cores) MEMORY(bytes)
argocd-application-controller-0 2265m 8921Mi
argocd-application-controller-1 1673m 8074Mi
argocd-application-controller-2 3913m 8695Mi
argocd-application-controller-3 4100m 9011Mi
argocd-application-controller-4 3121m 9346Mi
argocd-application-controller-5 2893m 9089Mi
argocd-application-controller-6 3073m 8293Mi
With this configuration, which was used before dynamic scaling feature was available, resource usage is as expected:
spec:
controller:
logFormat: json
logLevel: warn
processors:
operation: 100
status: 100
sharding:
enabled: true
replicas: 7
References:
- https://issues.redhat.com/browse/GITOPS-3338
- https://developers.redhat.com/articles/2023/09/26/dynamically-scale-argo-cd-application-controller-openshift-gitops-110
- https://issues.redhat.com/browse/GITOPS-2058
- https://argocd-operator.readthedocs.io/en/latest/reference/argocd/#controller-options
We have found related upstream issue here https://github.com/argoproj/argo-cd/issues/8175 cp-argocd
Please find the argocd yaml attached to this issue
Workaround:
After reverting the dynamic scaling, the usage returns back to normal and expected values.
Steps to Reproduce
- Deploy ArgoCD instance which manages at least 2 clusters
- Configure dynamic scaling feature like this:
sharding:
clustersPerShard: 1
dynamicScalingEnabled: true
maxShards: 7
minShards: 1
Actual results:
Expected results:
Reproducibility (Always/Intermittent/Only Once):
- is cloned by
-
GITOPS-5811 ArgoCD application controller has extremely high resource consumption with dynamic scaling
- Closed
- links to
-
RHBA-2024:142378 Errata Advisory for Red Hat OpenShift GitOps v1.14.2