-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
1.11.2, 1.12.3
-
8
-
False
-
None
-
False
-
-
-
GitOps Scarlet - Sprint 3262, GitOps Scarlet - Sprint 3263
-
Important
Description of problem:
Customer is using Argo CD to manage a not too large amount of namespaces (50-70). There is a reproducible issue when adding a new namespace that is watched by an Argo CD instance (Label "argocd.argoproj.io/managed-by") that Argo CD may lock up.
When Argo CD detects the new namespaces, the `argocd-application-controller` will show the following messages and will no longer sync:
E0528 08:11:05.560835 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.560958 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.561083 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.561145 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.561179 1 retrywatcher.go:130] "Watch failed" err="context canceled"
A similar issue has already been discussed in GITOPS-4440. It can also be observed that the metric `app_reconciliation_queue` increases and stays at this level until the Argo CD Application Controller is restarted.
As this will lock up Argo CD, this is considered to be a critical issue. Workaround is to restart the `argocd-application-controller`.
Prerequisites (if any, like setup, operators/versions):
OpenShift Container Platform 4.15.14
openshift-gitops-operator.v1.12.3
Steps to Reproduce
- Follow the steps in the reproducer: https://github.com/simonkrenger/knockout-argocd/tree/main
- Observe that after step 4 all Applications are healthy and can be synced
- Observe the logs after step 6
Actual results:
Argo CD locks up, does no longer Sync, log shows the following error messages:
E0528 08:11:05.560835 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.560958 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.561083 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.561145 1 retrywatcher.go:130] "Watch failed" err="context canceled" E0528 08:11:05.561179 1 retrywatcher.go:130] "Watch failed" err="context canceled"
Expected results:
When creating new namespaces with the "argocd.argoproj.io/managed-by" label, the new apps are synced as expected and Argo CD does not lock up.
Reproducibility (Always/Intermittent/Only Once):
Always using the reproducer
Additional info (Such as Logs, Screenshots, etc):
- Reproducer available in https://github.com/simonkrenger/knockout-argocd/tree/main
- "must-gather" available in attached Support Case
- GITOPS-4440 describes a similar issue