-
Feature
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
monitoring
-
Kubernetes-native Infrastructure
-
0% To Do, 0% In Progress, 100% Done
-
Telco 5G RAN
Problem Alignment
The Problem
Customers typically run more than one cluster and/or applications deployed across different regions. In such a hybrid cloud environment, aggregating metrics is a key requirement to avoid admins and or applications owners to drop in into individual clusters to troubleshoot specific problems. And since Red Hat does not offer a standalone metrics aggregation service, customers have started to use existing, home-grown technologies based on, for example, InfluxDB or Kafka to achieve that.
In summary:
- OpenShift Monitoring is optimized for short-term retention only.
- Red Hat does not offer a central metrics aggregation service yet.
- Customers use existing, home-grown technologies to distribute information across other stakeholders in their company.
High-Level Approach
Expose Prometheus remote-write configuration via our OpenShift Monitoring (Cluster and User Workload) ConfigMap to allow customers to push time-series data to a remote location.
Please note that we do not plan to support certain third party “receivers” with this solution. Customers will be responsible to ensure an appropriate receiving component is up and running that implements the “remote-write” API. Here is a list of possible “receiver” plugins.
Goal & Success
- Introduce some “ease of use” features to configure certain parts for remote-write to decrease possible misconfigurations.
- Allow customers to push metrics off the cluster to allow aggregation use cases and more options for our partners to integrate into OpenShift - e.g. to allow long-term retention or security/analytics scenarios.
Solution Alignment
Key Capabilities
- As an OpenShift administrator, I want to configure remote-write for both the OOTB infrastructure bundle and the user workload stack, so that time-series data will be available on the system of my choice.
- As an OpenShift administrator, I want to easily build an allow list of metrics that should be pushed externally.
Key Flows
User configures one of the available ConfigMaps to allow node_cpu_seconds_total to be written into a remote Thanos system.
- Administrator opens the cluster-monitoring-config ConfigMap.
- They add a new field to configure remote write.
- They add the node_cpu_seconds_total metric to the allow list.
- They add the remote URL for the Thanos receiver.
- They add a Secret to configure authentication against the remote service.
Additional resources
- relates to
-
RFE-719 [RFE] Support Prometheus remote_write / remoteWrite in OCP 4.X
- Accepted