-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
UWM cross namespace alerts
-
False
-
None
-
False
-
Not Selected
-
NEW
-
To Do
-
NEW
-
0% To Do, 0% In Progress, 100% Done
-
With this update, you can create user-defined alerting and recording rules that query multiple namespaces at the same time.
-
Feature
-
L
Epic Goal
- Allow user-defined monitoring administrators to define PrometheusRules objects spanning multiple/all user namespaces.
Why is this important?
- There's often a need to define similar alerting rules for multiple user namespaces (typically when the rule works on platform metrics such as kube-state-metrics or kubelet metrics).
- In the current situation, such rule would have to be duplicated in each user namespace which doesn't scale well:
- 100 expressions selecting 1 namespace each are more expensive than 1 expression selecting 100 namespaces.
- updating 100 PrometheusRule resources is more time-consuming and error-prone than updating 1 PrometheusRule object.
Scenarios
- A user-defined monitoring admin can provision a PrometheusRules object for which the PromQL expressions aren't scoped to the namespace where the object is defined.
- A cluster admin can forbid user-defined monitoring admins to use cross-namespace rules.
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- Follow FeatureGate Guidelines
- ...
Dependencies (internal and external)
- None (Prometheus-operator supports defining namespace-enforcement exceptions for PrometheusRules).
Previous Work (Optional):
Open questions::
In terms of risks:
- UWM admins may configure rules which overload the platform Prometheus and Thanos Querier.
- This is not very different from the current situation where ThanosRuler can run many UWM rules.
- All requests go first through the Thanos Querier which should "protect" Prometheus from DoS queries (there's a hard limit of 4 in-flight queries per Thanos Querier pod).
- UWM admins may configure rules that access platform metrics unavailable for application owners (e.g. without a namespace label or for an openshift-* label).
- In practice, UWM admins already have access to these metrics so it isn't a big change.
- It also enables use cases such as ROSA admin customers that can't deploy their platform alerts to openshift-monitoring today. With this new feature, the limitation will be lifted.
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is documented by
-
OBSDOCS-978 Document how user-workload monitoring admins can write general purpose alerting rules that can span several namespaces
- Review
- is duplicated by
-
MON-3395 ROSA: Allow customer-defined platform-wide alerting rules
- Closed
- relates to
-
OBSDA-237 [CEE.neXT] User-workload monitoring "admins" should be able to write general purpose alerting rules that can span several namespaces.
- In Progress
- links to