-
Bug
-
Resolution: Done
-
Critical
-
1.8.2
Description of problem:
The application controller is prone to a deadlock situation where all available status processors will be occupied but never released.
I tend to believe this happens in a scenario when an Application is being deleted and has a resource finalizer set (i.e. Argo CD should also tear down the managed resources), and the Application's target cluster is being removed during the pruning.
The only way to recover the application controller is to restart the pod.
The issue is difficult to reproduce.
Prerequisites (if any, like setup, operators/versions):
We've seen this happening in customer setups under the following conditions:
- Customer is using ACM and ApplicationSet with ClusterDecisionResource (CDR) Generator
- The CDR (via ApplicationSet) deletes the managed applications targeting this cluster BEFORE the cluster has been removed from Argo CD's. Thus, the resource finalizer which is set by ApplicationSet by default is not removed.
- ACM removes the GitOpsCluster resource, triggering the removal of Argo CD's cluster configuration
However, I believe it does not happen every time. So there must be a timing issue.
While this situation is not optimal, it should not lead to a complete halt of the application controller but should lead to a recoverable error situation.
Steps to Reproduce
Unknown to this point.
Actual results:
Controller deadlocks without possibility of recovery
Expected results:
Controller does not deadlock; application's resource deletion fails gracefully
Reproducibility (Always/Intermittent/Only Once):
Random
Acceptance criteria:
- Issue is reproduced in local dev environment
- When using GitOps through ACM, addition/deletion of clusters should not result in app-controller getting stuck and/or crashing
- e2e test that simulates ACM behavior of adding/removing clusters to verify that core issue is addressed
Definition of Done:
- Acceptance criteria is met
Build Details:
Additional info (Such as Logs, Screenshots, etc):
*
- is cloned by
-
GITOPS-3052 Argo CD application controller stops reconciling under certain circumstances
- Closed
- is duplicated by
-
GITOPS-2673 Argo CD Application controller is stuck Syncing applications
- Closed
- is related to
-
GITOPS-2782 OpenShift GitOps Performance Issue (v1.9.1)
- Closed
-
GITOPS-3192 OpenShift GitOps Performance Issue (v1.8.4)
- Closed
- links to