Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-4757

Adding new watched namespaces may lock up Argo CD

XMLWordPrintable

    • 8
    • False
    • None
    • False
    • GitOps Scarlet - Sprint 3262, GitOps Scarlet - Sprint 3263
    • Important

      Description of problem:

      Customer is using Argo CD to manage a not too large amount of namespaces (50-70). There is a reproducible issue when adding a new namespace that is watched by an Argo CD instance (Label "argocd.argoproj.io/managed-by") that Argo CD may lock up.

      When Argo CD detects the new namespaces, the `argocd-application-controller` will show the following messages and will no longer sync:

      E0528 08:11:05.560835 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.560958 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.561083 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.561145 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.561179 1 retrywatcher.go:130] "Watch failed" err="context canceled"

      A similar issue has already been discussed in GITOPS-4440. It can also be observed that the metric `app_reconciliation_queue` increases and stays at this level until the Argo CD Application Controller is restarted.

      As this will lock up Argo CD, this is considered to be a critical issue. Workaround is to restart the `argocd-application-controller`.
       

      Prerequisites (if any, like setup, operators/versions):

      OpenShift Container Platform 4.15.14
      openshift-gitops-operator.v1.12.3

      Steps to Reproduce

      1. Follow the steps in the reproducer: https://github.com/simonkrenger/knockout-argocd/tree/main
      2. Observe that after step 4 all Applications are healthy and can be synced
      3. Observe the logs after step 6
         

        Actual results:

      Argo CD locks up, does no longer Sync, log shows the following error messages:

      E0528 08:11:05.560835 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.560958 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.561083 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.561145 1 retrywatcher.go:130] "Watch failed" err="context canceled"
      E0528 08:11:05.561179 1 retrywatcher.go:130] "Watch failed" err="context canceled"

      Expected results:

      When creating new namespaces with the "argocd.argoproj.io/managed-by" label, the new apps are synced as expected and Argo CD does not lock up.

      Reproducibility (Always/Intermittent/Only Once):

      Always using the reproducer

      Additional info (Such as Logs, Screenshots, etc):

              jgwest Jonathan West
              rhn-support-skrenger Simon Krenger
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: