Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-3465

ArgoCD application controller has extremely high resource consumption with dynamic scaling

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 1.14.2
    • 1.13.0, 1.12.0, 1.10.0, 1.14.0
    • Operator
    • GitOps Tangerine - Sprint 3260, GitOps Tangerine - Sprint 3265, GitOps Tangerine - Sprint 3266

      Severity: Sev 3
      Customer case #: 03632032
      OCP Version: 4.12.30
      GitOps Operator Version: v1.10.0
      What is the impact of your issue: Unneccessarily requested resources by unused application-controller pods on multiple ArgoCD instances due to lack of dynamic scaling
      Slack thread: https://redhat-internal.slack.com/archives/CMP95ST2N/p1697474720786429 

      Description of problem:

      Client is  using the new dynamic scaling feature for application controller of OpenShift GitOps 1.10. Client experiences extremely high resource usage on the application-controller pods when using the dynamic scaling, as compared to the previous static scaling and legacy sharding. Client has an argocd instance which manages 368 applications spread over 8 clusters. Following controller configuration is set when using dynamic scaling, causing high resource usage, which aligns with upstream configuration options:

      spec:
       controller:
         logFormat: json
         logLevel: warn
         processors:
           operation: 100
           status: 100
         sharding:
           dynamicScalingEnabled: true
           minShards: 1
           maxShards:  10
           clustersPerShard: 1

      Following are the details of the pods resource consumption post-enabling dynamic scaling:
      $ oc adm top pod
      NAME CPU(cores) MEMORY(bytes)
      argocd-application-controller-0 2265m 8921Mi
      argocd-application-controller-1 1673m 8074Mi
      argocd-application-controller-2 3913m 8695Mi
      argocd-application-controller-3 4100m 9011Mi
      argocd-application-controller-4 3121m 9346Mi
      argocd-application-controller-5 2893m 9089Mi
      argocd-application-controller-6 3073m 8293Mi
       

      With this configuration, which was used before dynamic scaling feature was available, resource usage is as expected:

      spec:
       controller:
         logFormat: json
         logLevel: warn
         processors:
           operation: 100
           status: 100
         sharding:
           enabled: true
           replicas: 7

       

       

      References:

       

      We have found related upstream issue here https://github.com/argoproj/argo-cd/issues/8175 cp-argocd

       

      Please find the argocd yaml attached to this issue

      Workaround: 
      After reverting the dynamic scaling, the usage returns back to normal and expected values. 

      Steps to Reproduce

      • Deploy ArgoCD instance which manages at least 2 clusters
      • Configure dynamic scaling feature like this:
        sharding:
          clustersPerShard: 1
          dynamicScalingEnabled: true
          maxShards: 7
          minShards: 1

      Actual results:

      Expected results:

      Reproducibility (Always/Intermittent/Only Once):

              isequeir@redhat.com Ishita Sequeira
              rescott1 Regina Scott (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: