When running Red Hat OpenShift GitOps, it's possible to create additional instances beside the one created in openshift-gitops using ArgoCD resource.
The problem is, the OpenShift GitOps Operator that does not report any metric about overall instance health and availability. Meaning, malfunctioning instances are hard to catch and fix, which can have impact for production environments.
For example, when ResourceQuota is preventing the redis pod from starting, nothing is reported beside the state of the actual ArgoCD instance.
$ oc get argocd openshift-gitops -o json | jq '.status' { "applicationController": "Running", "dex": "Running", "host": "openshift-gitops-server-project-100.apps.foo.bar.intra", "phase": "Available", "redis": "Pending", "repo": "Running", "server": "Running", "ssoConfig": "Success" }
We can see that "redis": "Pending", but beside that, other ArgoCD functions may operate as intended or potentially not as intended. Further, the OpenShift GitOps Operator is constantly trying to reconcile the respective resource but without success (unless the ResourceQuota is adjusted).
Since OpenShift Container Platform 4 - Cluster can grow big and can have many different ArgoCD instance, it's required to have a way that can detect problematic states in the respective ArgoCD instance and report it in a central location Operator Condition or via metrics to tirgger an alert and make sure the problem can be solved (and also Cluster Administrators are aware about a potential problem).
Acceptance Criteria
- Research how to enable OpenShift Monitoring to watch/monitor instance status (as above).
- Create metric
- Get it to work and document the steps in User Guide
- is blocked by
-
GITOPS-1989 Expose status of ApplicationSet controller to Operand
- Closed
- is depended on by
-
GITOPS-2474 Simplify troubleshooting failing Argo CD instances
- Closed