XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Normal
Fix Version/s: openshift-4.18
Affects Version/s: None
Component/s: None
Labels:
- doc-impact
- groomed

Epic Name:
UWM cross namespace alerts
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Docs QE Status:
NEW
Epic Status:
To Do
QE Status:
NEW
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:
With this update, you can create user-defined alerting and recording rules that query multiple namespaces at the same time.
Release Note Type:
Feature
Size:
L

Target Version:

openshift-4.18

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Epic Goal

Allow user-defined monitoring administrators to define PrometheusRules objects spanning multiple/all user namespaces.

Why is this important?

There's often a need to define similar alerting rules for multiple user namespaces (typically when the rule works on platform metrics such as kube-state-metrics or kubelet metrics).
In the current situation, such rule would have to be duplicated in each user namespace which doesn't scale well:
- 100 expressions selecting 1 namespace each are more expensive than 1 expression selecting 100 namespaces.
- updating 100 PrometheusRule resources is more time-consuming and error-prone than updating 1 PrometheusRule object.

Scenarios

A user-defined monitoring admin can provision a PrometheusRules object for which the PromQL expressions aren't scoped to the namespace where the object is defined.
A cluster admin can forbid user-defined monitoring admins to use cross-namespace rules.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Follow FeatureGate Guidelines
...

Dependencies (internal and external)

None (Prometheus-operator supports defining namespace-enforcement exceptions for PrometheusRules).

Previous Work (Optional):

Open questions::

In terms of risks:

UWM admins may configure rules which overload the platform Prometheus and Thanos Querier.
- This is not very different from the current situation where ThanosRuler can run many UWM rules.
- All requests go first through the Thanos Querier which should "protect" Prometheus from DoS queries (there's a hard limit of 4 in-flight queries per Thanos Querier pod).
UWM admins may configure rules that access platform metrics unavailable for application owners (e.g. without a namespace label or for an openshift-* label).
- In practice, UWM admins already have access to these metrics so it isn't a big change.
- It also enables use cases such as ROSA admin customers that can't deploy their platform alerts to openshift-monitoring today. With this new feature, the limitation will be lifted.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

is documented by

OBSDOCS-978 Document how user-workload monitoring admins can write general purpose alerting rules that can span several namespaces

Closed

is duplicated by

MON-3395 ROSA: Allow customer-defined platform-wide alerting rules

Closed

relates to

OBSDA-237 [CEE.neXT] User-workload monitoring "admins" should be able to write general purpose alerting rules that can span several namespaces.

Closed

links to

openshift/openshift-docs#78965: OBSDOCS-190: user-defined-projects-first-steps-config

Assignee:: Simon Pasquier

Reporter:: Apurva Nisal

QA Contact:: Tai Gao

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/10/02 1:15 PM

Updated:: 2024/11/15 1:56 AM

Resolved:: 2024/11/15 1:56 AM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates