Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-2808

ArgoCD should handle failure to reach a cluster gracefully

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • None
    • ArgoCD
    • False
    • None
    • False

      Description of problem:

      This issue is related to https://issues.redhat.com/browse/GITOPS-2643.  (RedHat OpenShift GitOps applications are clogged if one of the application is stuck in ArgoCD.)

      Support case: https://access.redhat.com/support/cases/#/case/03354591 

      GITOPS-2643 is currently under investigation . The current finding is

      "We (gnunn@redhat.com et al) have diagnosed the issue is cased by a deleted (by ACM) cluster and therefore there is no connectivity to the cluster." 

      Jann analyzed the logs and it appears that there are connectivity issues between the hub/central argo and one or more remote clusters. Our working theory is that this is causing the PlacementDecision to be updated which is impacting the GitOpsCluster and ApplicationSet that are dependent on the corresponding Placements. Can you please add the following toleration to all of the Placements that are being used with GitOpsCluster and ApplicationSet which will instruct the PlacementDecision to tolerate an unhealthy cluster.

      tolerations:

      • key: cluster.open-cluster-management.io/unreachable
        operator: Exists

      You can see a similar issue reported upstream here: https://kubernetes.slack.com/archives/C01GE7YSUUF/p1678960735116749"

      For this issue, when a cluster is deleted (e.g. deleted by ACM), argocd should handle it gracefully. It should not be clogged. 

      Note: the reason we only this ticket separately is because aveerama@redhat.com  agrees that GITOPS-2643 does not address the ArgoCD clogging behavior 

       

      Prerequisites (if any, like setup, operators/versions):

      See GITOPS-2643

      Actual results:

      If one the application is stuck the other applications in the queue are been getting blocked.

      Expected results:

      When a cluster is deleted (e.g. deleted by ACM), argocd should handle it gracefully. It should not be clogged. 

       

      Build Details:

      OCP: 4.10

      Red Hat OpenShift GitOps: 1.6.2

      Short summary:

      • Client have installed "Red Hat OpenShift GitOps" operator in their OCP cluster.
      • Created multiple argocd instances in different Namespaces
      • They have multiple applications deployed from these instances .
      • If any one of the application gets stuck it results in blocking rest of the applications in queue.

       

      Additional info:

       

       

       

              Unassigned Unassigned
              wtam_at_redhat William Tam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: