-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
Flexible user monitoring
-
False
-
False
-
NEW
-
Done
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
Goals
- Provide a mechanism for customers to opt-out/opt-in from observing particular user-defined namespaces.
Non-Goals
- This mechanism is only for User Workload Monitoring, not Cluster Monitoring. So you can't opt-out/opt-in from us observing all system/platform relevant namespaces.
- Solving any Operator CRD conflicts or running two or more Prometheus Operator with different versions.
Motivation
OpenShift Monitoring currently supports monitoring workloads that are deployed in any namespace on the cluster via the User Workload Monitoring (UWM) stack. We have customers that struggle with the approach to have UWM monitor all namespaces by default and not leave configuration to trim this list of watches namespaces to only the ones that are interesting.
There are two main scenarios for trimming down the list.
Scenario 1:
You are a cluster admin in a company called Acme XYZ. Acme XYZ is a platform provider that offers compute to their users. Every customer of Acme XYZ can request space where they are able to deploy their own workload. Some customer workloads are very important and need very high SLAs on the availability of the cluster.
Now, a cluster admin who works for Acme XYZ is responsible for many clusters that run a lot of namespaces and each namespace is dedicated to a single customer. Acme XYZ does not provide a Monitoring solution for their customers as each may have different requirements. Therefore, every customer is responsible for choosing a Monitoring solution that fits their needs.
On the other hand, the cluster admin team uses OpenShift’s Monitoring solution to observe infrastructure components that are important to keep their promised SLAs in check. They want OpenShift Monitoring solution to really focus on the platform-related namespaces and additionally on namespaces that the admin team owns. Namespaces owned by customers should not be observed by the OpenShift Monitoring solution to avoid possible conflicts or double scraping if metrics.
Scenario 2
You are a cluster admin in a company called Acme ABC. You manage various, company-wide OpenShift clusters for different departments and one of your responsibility areas is to provide a single, central Monitoring stack to avoid each department to learn the necessary tools to do that themselves. For that, you are using OpenShift Monitoring. Every department gets their own namespace where they can run any workload necessary that makes them successful. That means, the cluster admin does not control the information scraped from these workloads and even with all the guidelines defined by the admin team, there are always outliers.
As a cluster admin, you would love to protect the Monitoring stack from those outliers so that one culprit does not impact observing workload from all other departments. Therefore, if you see someone goes over the norm, you would love to stop observing that namespace and give the department a heads-up to investigate possible improvements to their Monitoring data they expose. As soon as they are back in the norm, you can easily bring them back into the central Monitoring stack.
Alternatives
Acceptance Criteria
- Verify that by default, we observe all namespaces if not configured otherwise.
User workload monitoring doesn't pick up monitoring resources (service/pod monitors, rules) from namespaces that are excluded (either by label or CMO configuration).
Risk and Assumptions
Documentation Considerations
- Document which label can be set on namespaces to exclude them from user workload monitoring.
Document CMO configuration field(s) to exclude namespaces from user workload monitoring.