XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: 1.8.0
Affects Version/s: None
Component/s: Operator
Labels:
None

Story Points:
8
Epic Link:
Introduce Instance/Operator level metrics/monitoring for OpenShift GitOps
Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
GITOPS Sprint 229

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a cluster admin who has GitOps operator installed on their cluster, i would like the GitOps operator to expose instance level health related metrics so that I have the option to consume them should I want to

Background (Required)

In order to set up alerts to be notified when some instance workloads are down, we must first have a way to encode this information into Prometheus. This requires us to define new metrics within the operator, that would work on the instance level, and expose them at a new endpoint so that they can be consumed by prometheus.

Out of scope

- metrics other than the status of a workload
- setting up resources to actually send alerts out

Approach (Required)

As a part of this story we must :

Define and create new metrics within the operator to describe each workload's current status
Start a new metrics server go routine within the operator, at a new port
Write operator logic in various places to track and update metric values locally whenever a workload's status changes
Use Prometheus go-lang client library
Use Argo CD Image Updater code for reference -> https://github.com/argoproj-labs/argocd-image-updater/blob/master/pkg/metrics/metrics.go

{}The metrics we need to focus on at this stage are :

applicationControllerStatus
applicationSetControllerStatus
repoServerStatus
serverStatus
notificationControllerStatus
dexStatus
redisStatus
argoCDPhaseStatus

Each metric can take on a value between 0-3 that maps as follows:

“Unknown” = 0

“Pending” = 1

“Running” = 2

“Available” = 3

The value of each metric must be updated and tracked to reflect the status of the actual workload and published to the /metrics end point immediately

Dependencies

Depends on https://issues.redhat.com/browse/GITOPS-1989

Acceptance Criteria (Mandatory)

Operator spins up new metrics server internally and exposes new metrics at /metrics end point, which can be used for consumption

INVEST Checklist

Dependencies identified

Blockers noted and expected delivery timelines set

Design is implementable

Acceptance criteria agreed upon

Story estimated

Legend

Unknown

Verified

Unsatisfied

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

Assignee:: Jaideep Rao

Reporter:: Jaideep Rao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/12/07 12:34 PM

Updated:: 2023/01/03 4:11 PM

Resolved:: 2023/01/03 4:11 PM

Details

Description

Story (Required)

Background (Required)

Out of scope

Approach (Required)

Acceptance Criteria (Mandatory)

INVEST Checklist

Legend

Done Checklist

Attachments

Easy Agile Planning Poker

Activity

People

Dates