XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: 1.10.0
Affects Version/s: None
Component/s: Operator
Labels:
None

Story Points:
8
Epic Link:
GITOPS-2645
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:
Target Version:

1.10.0

Sprint:
GITOPS Sprint 237, GITOPS Sprint 238, GITOPS Sprint 239, GITOPS Sprint 240, GITOPS Sprint 241, GITOPS Sprint 243

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a consumer of GitOps operator in the service, I want to be able to have insight into the performance of the gitops operator through a set of well defined metrics that are exposed so that I can know how much load it can handle efficiently

Background (Required)

GitOps service uses the GitOps operator to deploy managed Argo CD instances. In order to provide a robust and efficient service, we need to be aware of what the operator's current performance limits are so that we can know where to make improvements in the future.

The operator being bootstrapped using operator-sdk already runs a metrics server and serves some general prometheus-friendly controller-related metrics out of the box thanks to controller-runtime. The operator just needs to expose additional custom metrics that are specific to argo-cd

Out of scope

creation of metrics dashboards

Approach (Required)

This work needs to go into argocd-operator

Generate servicemonitor manifest that will be installed out of the box along with the operator using operator-sdk
Create the following metrics and register them
- argocd_instances_reconciled (type guage, seclector: state)
- argocd_reconciliation_duration (type histogram, selector: namespace)(other metrics like total_reconciliations, cpu/memory usage and no. of goroutines are already exposed by default)
Go through reconciler code and find appropriate places to update above established metrics
see https://github.com/argoproj-labs/argocd-operator/pull/830 for reference
after updation metrics are automatically exposed on the already running server
Add unit/e2e tests to verify proper exposure of required metrics

See for more guidance:
1. guide to expose controller-runtime metrics with operator-sdk https://docs.okd.io/4.9/operators/operator_sdk/osdk-monitoring-prometheus.html
2. prometheus metrics types and operations https://prometheus.io/docs/concepts/metric_types/

3. default controller-runtime metrics exposed

[https://book.kubebuilder.io/reference/metrics-reference.html|\{_}{_}https://book.kubebuilder.io/reference/metrics-reference.html\{_}{_}]

Dependencies

none

Acceptance Criteria (Mandatory)

Required metrics are exposed and can be accessed at /metrics end point
unit/e2e tests are added to verify behavior

INVEST Checklist

Dependencies identified

Blockers noted and expected delivery timelines set

Design is implementable

Acceptance criteria agreed upon

Story estimated

Legend

Unknown

Verified

Unsatisfied

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

is blocked by

GITOPS-2973 Incorrect phase reconciliation for Argo CD instances

Closed

is cloned by

GITOPS-3244 [Backport] Expose metrics to guage operator performance

Closed

Assignee:: Jaideep Rao

Reporter:: Jaideep Rao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/04/14 7:39 PM

Updated:: 2023/08/16 4:15 PM

Resolved:: 2023/08/01 9:54 PM

Details

Description

Story (Required)

Out of scope

Approach (Required)

Dependencies

Acceptance Criteria (Mandatory)

INVEST Checklist

Legend

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates