-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
8
-
False
-
None
-
False
-
SECFLOWOTL-87 - Operator to handle fleets of ArgoCD CRs
-
-
-
GITOPS Sprint 237, GITOPS Sprint 238, GITOPS Sprint 239, GITOPS Sprint 240, GITOPS Sprint 241, GITOPS Sprint 243
Story (Required)
As a consumer of GitOps operator in the service, I want to be able to have insight into the performance of the gitops operator through a set of well defined metrics that are exposed so that I can know how much load it can handle efficiently
Background (Required)
GitOps service uses the GitOps operator to deploy managed Argo CD instances. In order to provide a robust and efficient service, we need to be aware of what the operator's current performance limits are so that we can know where to make improvements in the future.
The operator being bootstrapped using operator-sdk already runs a metrics server and serves some general prometheus-friendly controller-related metrics out of the box thanks to controller-runtime. The operator just needs to expose additional custom metrics that are specific to argo-cd
Out of scope
creation of metrics dashboards
Approach (Required)
This work needs to go into argocd-operator
- Generate servicemonitor manifest that will be installed out of the box along with the operator using operator-sdk
- Create the following metrics and register them
- argocd_instances_reconciled (type guage, seclector: state)
- argocd_reconciliation_duration (type histogram, selector: namespace)(other metrics like total_reconciliations, cpu/memory usage and no. of goroutines are already exposed by default) - Go through reconciler code and find appropriate places to update above established metrics
see https://github.com/argoproj-labs/argocd-operator/pull/830 for reference - after updation metrics are automatically exposed on the already running server
- Add unit/e2e tests to verify proper exposure of required metrics
See for more guidance:
1. guide to expose controller-runtime metrics with operator-sdk https://docs.okd.io/4.9/operators/operator_sdk/osdk-monitoring-prometheus.html
2. prometheus metrics types and operations https://prometheus.io/docs/concepts/metric_types/
3. default controller-runtime metrics exposed
[https://book.kubebuilder.io/reference/metrics-reference.html|\{_}{_}https://book.kubebuilder.io/reference/metrics-reference.html\{_}{_}]
Dependencies
none
Acceptance Criteria (Mandatory)
- Required metrics are exposed and can be accessed at /metrics end point
- unit/e2e tests are added to verify behavior
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met
- is blocked by
-
GITOPS-2973 Incorrect phase reconciliation for Argo CD instances
- Closed
- is cloned by
-
GITOPS-3244 [Backport] Expose metrics to guage operator performance
- Closed