XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: openshift-4.11
Affects Version/s: None
Component/s: None
Labels:
- groomed
- telco-5g

Epic Name:
Allow admin users to create new alerting rules based on platform metrics
Blocked:
False
Ready:
False
Docs QE Status:
NEW
Epic Status:
To Do
Feature Link:
OBSDA-2 - Allow administrators to create new alerting rules based on platform-defined metrics
Flagged:

Impediment
QE Status:
NEW
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:

Hide
==== New option to create alerts based on core platform metrics

With this release, administrators can create new alerting rules based on core platform metrics.
You can now modify settings for existing platform alerting rules by adjusting thresholds and by changing labels.
You can also define and add new custom alerting rules by constructing a query expression based on core platform metrics in the openshift-monitoring namespace.
For more information, see xref:../monitoring/managing-alerts.adoc#managing-core-platform-alerting-rules_managing-alerts[Managing alerting rules for core platform monitoring].

Show
==== New option to create alerts based on core platform metrics With this release, administrators can create new alerting rules based on core platform metrics. You can now modify settings for existing platform alerting rules by adjusting thresholds and by changing labels. You can also define and add new custom alerting rules by constructing a query expression based on core platform metrics in the openshift-monitoring namespace. For more information, see xref:../monitoring/managing-alerts.adoc#managing-core-platform-alerting-rules_managing-alerts[Managing alerting rules for core platform monitoring].
Release Note Type:
Enhancement
Release Note Status:
Done

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Epic Goal

Allow admin user to create new alerting rules, targeting metrics in any namespace
Allow cloning of existing rules to simplify rule creation
Allow creation of silences for existing alert rules

Why is this important?

Currently, any platform-related metrics (exposed in a openshift-, kube- and default namespace) cannot be used to form a new alerting rule. That makes it very difficult for administrators to enrich our out of the box experience for the OpenShift Container Platform with new rules that may be specific to their environments.

Additionally, we had requests from customer to allow modifications of our existing, out of the box alerting rules (for instance tweaking the alert expression or changing the severity label). Unfortunately, that is not easy since most rules come from several open source projects, or other OpenShift components, and any modifications would make a seamless upgrade not really seamless anymore. Imagine K8s changes metrics again (see 1.14) and we have to update our rules. We would not know what modifications have been done (even just the threshold might be difficult if upstream changes that as well) and we would not be able to upgrade these rules.

Scenarios

I'd like to modify the query expression of an existing rule (because the threshold value doesn't match with my environment).

Cloning the existing rule should end up with a new rule in the same namespace.
Modifications can now be done to the new rule.
(Optional) You can silence the existing rule.

I'd like to create a new rule based on a metric only available to an openshift-* namespace

Create a new PrometheusRule object inside the namespace that includes the metrics you need to form the alerting rule.

I'd like to update the label of an existing rule.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Ability to distinguish between rules deployed by us (CMO) and user created rules

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Distinguish between operator-created rules and user-created rules
Currently no such mechanism exists. This will need to be added to prometheus-operator or cluster-monitoring-operator.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

is documented by

RHDEVDOCS-3423 [TP] Allow admin users to create new alerting rules based on platform metrics

Closed

is related to

OBSDA-210 Graduate alert overrides and alert relabelings to GA

In Progress

links to

KCS 6955611: Create new alerting rules based on platform-defined metrics in OpenShift

openshift/openshift-docs#43249: OCP 4.11 Release Notes Tracker

1.	QE Tracker	Closed	Junqi Zhao
2.	TE Tracker	Closed	Eric Rich
3.	Docs Tracker	Closed	Brian Burt

Assignee:: Jan Fajerski

Reporter:: Jan Fajerski

QA Contact:: Junqi Zhao

Votes:: 4 Vote for this issue

Watchers:: 15 Start watching this issue

Created:: 2021/08/04 9:04 AM

Updated:: 2025/12/26 3:27 PM

Resolved:: 2022/08/02 2:26 PM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates