XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: 1.8.0
Affects Version/s: None
Component/s: Operator
Labels:
None

Story Points:
8
Epic Link:
Introduce Instance/Operator level metrics/monitoring for OpenShift GitOps
Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
GITOPS Sprint 229, GITOPS Sprint 230

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a cluster admin using OpenShift GitOps on my cluster, running multiple Argo CD instances, I would like to be able to enable monitoring/alerts on some(or all) my instance workloads so that I am alerted if they become unavailable for long periods of time.

Background (Required)

Users need to be able to express which workloads they need be alerted about. This allows operator to know which worloads to create prometheus rules for once the metrics about all workloads have been made available at a new endpoint

Out of scope

Writing code to create new metrics within the operator

Approach (Required)

For this story we must:

Create servicemonitor so that prometheus can watch the operator service for new exposed /metrics port

Add new field in the CR to capture whether monitoring is needed for each individual instance (For e.g `.spec.monitoring.enabled=true`) For argocd workloads.

Create new opinionated PrometheusRules for each workload. operator must not reconcile changes to these rules as users should be able to modify the rule however they wish.
For Argo CD operator - add permissions for operator serviceAccount to be able to create/delete/manage prometheusRules (https://github.com/argoproj-labs/argocd-operator/blob/master/bundle/manifests/argocd-operator.clusterserviceversion.yaml#L994). For GitOps operator this is already present (https://github.com/redhat-developer/gitops-operator/blob/master/bundle/manifests/gitops-operator.clusterserviceversion.yaml#L415)

for non-core workloads (like SSO, notifications) the feature should be enabled in order to create the rule
If monitoring is disabled, clan up all resources created for it including prometheusRules and servicemonitor

Dependencies

no dependencies

Acceptance Criteria (Mandatory)

Operator must successfully create servicemonitor and required prometheusRules. Argo CD CRD must be updated with new fields for monitoring

INVEST Checklist

Dependencies identified

Blockers noted and expected delivery timelines set

Design is implementable

Acceptance criteria agreed upon

Story estimated

Legend

Unknown

Verified

Unsatisfied

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

links to

openshift/openshift-docs#55921: Document GitOps v1.8 RN

Assignee:: Jaideep Rao

Reporter:: Jaideep Rao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/12/07 12:43 PM

Updated:: 2023/03/16 3:00 PM

Resolved:: 2023/01/23 1:24 PM

Details

Description

Story (Required)

Background (Required)

Out of scope

Approach (Required)

Dependencies

Acceptance Criteria (Mandatory)

INVEST Checklist

Legend

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates