-
Task
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
False
-
None
-
False
-
NEW
-
NEW
-
Overview:
- In OpenShift Container Native Virtualization QE team we are frequently deploying clusters to run tests on, validate bug fixes, and more.
- OpenShift Container Platform includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. ([1])
- Currently, we are not persisting any data that has been collected by Prometheus and the data's life cycle is as the pod's life-cycle.
- We would like to start persisting clusters data that has been collected by Prometheus.
- We would also like to take the current default stack and scale it horizontally with centralized Prometheus cluster.
Why do we want to persist metrics?
- Prometheus data’s life cycle is as the pod’s life cycle.
- We should be able to debug failures/events for post-mortem clusters
Why do we need a centralized cluster?
- Phase #1: We would like to have a dashboard where we can watch and analyze metrics data from our clusters.
- One-stop source for all the metrics related to running clusters
Requirements:
- What resources need to be collected?
Metrics from Prometheus - What volume of data will be collected?
rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[4d]) = 2723.675487 rate(prometheus_tsdb_compaction_chunk_samples_sum[4d]) = 2819.718854 rate(prometheus_tsdb_head_samples_appended_total[4d]) = 3064.62018
- What is the expected or forecasted growth in collected data?
10% growth in data per month - How many users will require read access?
The entire OpenShift Virtualization (cnv) Department. - How many clients will be emitting data?
1 Prometheus server per each cluster,
For PoC we are planning to write metrics from 15 clusters. - How long should data be retained?
1 Month - How many independent service accounts do you need to write data into the platform?
1 Service account will be used for all clusters hosted on all platforms (AWS/Azure/IBMC/PSI/other)
Request Checklist:
[V] Description of your service
[V] Explanation of why you need would benefit from sending data to Observatorium
[V] All of the answers to the collected requirements from section 1
[V] The organization ID of the new customer portal account from section 2
`260Z0RxxPFealmmU9ApvsvOZaps`
[V] A link to the Service Now ticket from section 3
[V] The email address specified in the service account in section 3
`cnv-qe-devops@redhat.com`
[V] A link to the merge request from section 4
https://gitlab.cee.redhat.com/service/ocm-resources/-/merge_requests/2073
- is triggered by
-
OBSDA-37 Onboard CNV-QE OpenShift Clusters Metrics Onto Observatorium
- Closed