-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
8
-
False
-
None
-
False
-
-
-
GITOPS Sprint 229
Story (Required)
As a cluster admin who has GitOps operator installed on their cluster, i would like the GitOps operator to expose instance level health related metrics so that I have the option to consume them should I want to
Background (Required)
In order to set up alerts to be notified when some instance workloads are down, we must first have a way to encode this information into Prometheus. This requires us to define new metrics within the operator, that would work on the instance level, and expose them at a new endpoint so that they can be consumed by prometheus.
Out of scope
- metrics other than the status of a workload
- setting up resources to actually send alerts out
Approach (Required)
As a part of this story we must :
- Define and create new metrics within the operator to describe each workload's current status
- Start a new metrics server go routine within the operator, at a new port
- Write operator logic in various places to track and update metric values locally whenever a workload's status changes
- Use Prometheus go-lang client library
- Use Argo CD Image Updater code for reference -> https://github.com/argoproj-labs/argocd-image-updater/blob/master/pkg/metrics/metrics.go
{}The metrics we need to focus on at this stage are :
- applicationControllerStatus
- applicationSetControllerStatus
- repoServerStatus
- serverStatus
- notificationControllerStatus
- dexStatus
- redisStatus
- argoCDPhaseStatus
Each metric can take on a value between 0-3 that maps as follows:
“Unknown” = 0
“Pending” = 1
“Running” = 2
“Available” = 3
The value of each metric must be updated and tracked to reflect the status of the actual workload and published to the /metrics end point immediately
Dependencies
Depends on https://issues.redhat.com/browse/GITOPS-1989
Acceptance Criteria (Mandatory)
Operator spins up new metrics server internally and exposes new metrics at /metrics end point, which can be used for consumption
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met