Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-2474

Simplify troubleshooting failing Argo CD instances

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • Simplify troubleshooting failing Argo CD instances
    • False
    • None
    • False
    • 0
    • 0% 0%

      Epic Goal

      • Assist/simplify the process of debugging/troubleshooting instances that users/admins have been alerted are failing by providing a centralized dashboard/page unique to OpenShift GitOps in the console that can provide consolidated information about which instances are healthy/failing and other related information

      NOTE:  This epic would build out the 2nd part of this proposal

      Why is this important?

      • At present we have no infrastructure/processes in place to effectively debug why an instance is not in healthy state. We only have statuses of workloads tracked in the Argo CD CR status, which can be one of either 'unknown', 'failed', 'pending; or 'running'. If an instance is in 'failed' or  'pending' state with no additional information whatsoever. In such situations (especially production environments with multiple instances) this is not helpful. Typically troubleshooting such issues involves looking at events/logs etc. Users/admins will need to know how to go about debugging what the underlying cause is.
      • Having a dedicated page/hub that contains information about the health status of all the instances in a color coded manner along with supplementary information like error logs and events to help accelerate the troubleshooting process would be a huge value add to admins that have to manage multiple instances/alert instance owners about issues

      Scenarios

      1. I am a cluster admin who manages a cluster running OpenShift GitOps operator with 100s of Argo CD instances. I have 

      Acceptance Criteria (Mandatory)

      • There is an established, central monitoring hub in the console for all Argo CD instances managed by the OpenShift GItOps operator that can be used to troubleshoot failing instances 

      Dependencies (internal and external)

      1. https://issues.redhat.com/browse/GITOPS-2039

      Done Checklist

      • Acceptance criteria are met
      • Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
      • User Journey automation is delivered
      • Support and SRE teams are provided with enough skills to support the feature in production environment

            Unassigned Unassigned
            jrao@redhat.com Jaideep Rao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: