XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Normal
Fix Version/s: openshift-4.9
Affects Version/s: None
Component/s: None
Labels:
- doc-ack
- groomed
- px-ack
- qe-ack

Epic Name:
Flexible user monitoring
Blocked:
False
Ready:
False
Docs QE Status:
NEW
Epic Status:
Done
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:
Undefined

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Market:

Goals

Provide a mechanism for customers to opt-out/opt-in from observing particular user-defined namespaces.

Non-Goals

This mechanism is only for User Workload Monitoring, not Cluster Monitoring. So you can't opt-out/opt-in from us observing all system/platform relevant namespaces.
Solving any Operator CRD conflicts or running two or more Prometheus Operator with different versions.

Motivation

OpenShift Monitoring currently supports monitoring workloads that are deployed in any namespace on the cluster via the User Workload Monitoring (UWM) stack. We have customers that struggle with the approach to have UWM monitor all namespaces by default and not leave configuration to trim this list of watches namespaces to only the ones that are interesting.

There are two main scenarios for trimming down the list.

Scenario 1:

You are a cluster admin in a company called Acme XYZ. Acme XYZ is a platform provider that offers compute to their users. Every customer of Acme XYZ can request space where they are able to deploy their own workload. Some customer workloads are very important and need very high SLAs on the availability of the cluster.

Now, a cluster admin who works for Acme XYZ is responsible for many clusters that run a lot of namespaces and each namespace is dedicated to a single customer. Acme XYZ does not provide a Monitoring solution for their customers as each may have different requirements. Therefore, every customer is responsible for choosing a Monitoring solution that fits their needs.

On the other hand, the cluster admin team uses OpenShift’s Monitoring solution to observe infrastructure components that are important to keep their promised SLAs in check. They want OpenShift Monitoring solution to really focus on the platform-related namespaces and additionally on namespaces that the admin team owns. Namespaces owned by customers should not be observed by the OpenShift Monitoring solution to avoid possible conflicts or double scraping if metrics.

~~Scenario 2~~

You are a cluster admin in a company called Acme ABC. You manage various, company-wide OpenShift clusters for different departments and one of your responsibility areas is to provide a single, central Monitoring stack to avoid each department to learn the necessary tools to do that themselves. For that, you are using OpenShift Monitoring. Every department gets their own namespace where they can run any workload necessary that makes them successful. That means, the cluster admin does not control the information scraped from these workloads and even with all the guidelines defined by the admin team, there are always outliers.

As a cluster admin, you would love to protect the Monitoring stack from those outliers so that one culprit does not impact observing workload from all other departments. Therefore, if you see someone goes over the norm, you would love to stop observing that namespace and give the department a heads-up to investigate possible improvements to their Monitoring data they expose. As soon as they are back in the norm, you can easily bring them back into the central Monitoring stack.

Alternatives

Acceptance Criteria

Verify that by default, we observe all namespaces if not configured otherwise.
~~User workload monitoring doesn't pick up monitoring resources (service/pod monitors, rules) from namespaces that are excluded (either by label or CMO configuration).~~

Risk and Assumptions

Documentation Considerations

Document which label can be set on namespaces to exclude them from user workload monitoring.
~~Document CMO configuration field(s) to exclude namespaces from user workload monitoring.~~

Open Questions

Additional Notes

Assignee:: Simon Pasquier

Reporter:: Christian Heidenreich (Inactive)

QA Contact:: Junqi Zhao

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2021/04/30 2:56 PM

Updated:: 2025/09/13 1:41 AM

Resolved:: 2021/09/02 2:18 PM

Details

Description

Goals

Non-Goals

Motivation

Alternatives

Acceptance Criteria

Risk and Assumptions

Documentation Considerations

Open Questions

Additional Notes

Attachments

Easy Agile Planning Poker

Activity

People

Dates