Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: 1.8.0
Affects Version/s: None
Component/s: Operator
Labels:
- doc-req
- gitops
- observability
- operator

Epic Name:
Introduce Instance/Operator level metrics/monitoring for OpenShift GitOps
Story Points:
5
Blocked:
False
Blocked Reason:
None
Ready:
False
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

Sprint:
GITOPS Sprint 221

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

When running Red Hat OpenShift GitOps, it's possible to create additional instances beside the one created in openshift-gitops using ArgoCD resource.

The problem is, the OpenShift GitOps Operator that does not report any metric about overall instance health and availability. Meaning, malfunctioning instances are hard to catch and fix, which can have impact for production environments.

For example, when ResourceQuota is preventing the redis pod from starting, nothing is reported beside the state of the actual ArgoCD instance.

$ oc get argocd openshift-gitops -o json | jq '.status'
{
  "applicationController": "Running",
  "dex": "Running",
  "host": "openshift-gitops-server-project-100.apps.foo.bar.intra",
  "phase": "Available",
  "redis": "Pending",
  "repo": "Running",
  "server": "Running",
  "ssoConfig": "Success"
}

We can see that "redis": "Pending", but beside that, other ArgoCD functions may operate as intended or potentially not as intended. Further, the OpenShift GitOps Operator is constantly trying to reconcile the respective resource but without success (unless the ResourceQuota is adjusted).

Since OpenShift Container Platform 4 - Cluster can grow big and can have many different ArgoCD instance, it's required to have a way that can detect problematic states in the respective ArgoCD instance and report it in a central location Operator Condition or via metrics to tirgger an alert and make sure the problem can be solved (and also Cluster Administrators are aware about a potential problem).

Acceptance Criteria

Research how to enable OpenShift Monitoring to watch/monitor instance status (as above).
Create metric
Get it to work and document the steps in User Guide

is blocked by

GITOPS-1989 Expose status of ApplicationSet controller to Operand

Closed

is depended on by

GITOPS-2474 Simplify troubleshooting failing Argo CD instances

Closed

Assignee:: Jaideep Rao

Reporter:: Simon Reber

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/05/30 11:21 AM

Updated:: 2023/09/20 2:57 AM

Resolved:: 2023/02/01 5:19 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates