-
Epic
-
Resolution: Done
-
Major
-
None
-
None
-
Allow admin users to create new alerting rules based on platform metrics
-
False
-
False
-
NEW
-
To Do
-
OBSDA-2 - Allow administrators to create new alerting rules based on platform-defined metrics
-
Impediment
-
NEW
-
0% To Do, 0% In Progress, 100% Done
-
-
Enhancement
-
Done
Epic Goal
- Allow admin user to create new alerting rules, targeting metrics in any namespace
- Allow cloning of existing rules to simplify rule creation
- Allow creation of silences for existing alert rules
Why is this important?
- Currently, any platform-related metrics (exposed in a openshift-, kube- and default namespace) cannot be used to form a new alerting rule. That makes it very difficult for administrators to enrich our out of the box experience for the OpenShift Container Platform with new rules that may be specific to their environments.
- Additionally, we had requests from customer to allow modifications of our existing, out of the box alerting rules (for instance tweaking the alert expression or changing the severity label). Unfortunately, that is not easy since most rules come from several open source projects, or other OpenShift components, and any modifications would make a seamless upgrade not really seamless anymore. Imagine K8s changes metrics again (see 1.14) and we have to update our rules. We would not know what modifications have been done (even just the threshold might be difficult if upstream changes that as well) and we would not be able to upgrade these rules.
Scenarios
- I'd like to modify the query expression of an existing rule (because the threshold value doesn't match with my environment).
Cloning the existing rule should end up with a new rule in the same namespace.
Modifications can now be done to the new rule.
(Optional) You can silence the existing rule.
- I'd like to create a new rule based on a metric only available to an openshift-* namespace
Create a new PrometheusRule object inside the namespace that includes the metrics you need to form the alerting rule.
- I'd like to update the label of an existing rule.
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- Ability to distinguish between rules deployed by us (CMO) and user created rules
Dependencies (internal and external)
Previous Work (Optional):
Open questions::
- Distinguish between operator-created rules and user-created rules
Currently no such mechanism exists. This will need to be added to prometheus-operator or cluster-monitoring-operator.
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is documented by
-
RHDEVDOCS-3423 [TP] Allow admin users to create new alerting rules based on platform metrics
- Closed
- is related to
-
OBSDA-210 Graduate alert overrides and alert relabelings to GA
- In Progress
- links to
There are no Sub-Tasks for this issue.