-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
8
-
False
-
None
-
False
-
-
-
GITOPS Sprint 229, GITOPS Sprint 230
Story (Required)
As a cluster admin using OpenShift GitOps on my cluster, running multiple Argo CD instances, I would like to be able to enable monitoring/alerts on some(or all) my instance workloads so that I am alerted if they become unavailable for long periods of time.
Background (Required)
Users need to be able to express which workloads they need be alerted about. This allows operator to know which worloads to create prometheus rules for once the metrics about all workloads have been made available at a new endpoint
Out of scope
Writing code to create new metrics within the operator
Approach (Required)
For this story we must:
- Create servicemonitor so that prometheus can watch the operator service for new exposed /metrics port
- Add new field in the CR to capture whether monitoring is needed for each individual instance (For e.g `.spec.monitoring.enabled=true`) For argocd workloads.
- Create new opinionated PrometheusRules for each workload. operator must not reconcile changes to these rules as users should be able to modify the rule however they wish.
- For Argo CD operator - add permissions for operator serviceAccount to be able to create/delete/manage prometheusRules (https://github.com/argoproj-labs/argocd-operator/blob/master/bundle/manifests/argocd-operator.clusterserviceversion.yaml#L994). For GitOps operator this is already present (https://github.com/redhat-developer/gitops-operator/blob/master/bundle/manifests/gitops-operator.clusterserviceversion.yaml#L415)
- for non-core workloads (like SSO, notifications) the feature should be enabled in order to create the rule
- If monitoring is disabled, clan up all resources created for it including prometheusRules and servicemonitor
Dependencies
no dependencies
Acceptance Criteria (Mandatory)
Operator must successfully create servicemonitor and required prometheusRules. Argo CD CRD must be updated with new fields for monitoring
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met