- Cluster admins want to configure the retention size for their metrics.
Why is this important?
- While it is possible to define how long metrics should be retained on disk, it's not possible to tell the cluster monitoring operator how much data it should keep. For OSD/ROSA in particular, it would facilitate the management of the fleet if the retention size could be configured based on the persistent volume size because it would avoid issues with the storage getting full and monitoring being down when too many metrics are produced.
- As a cluster admin, I want to define the maximum amount of data to be retained on the persistent volume.
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- The cluster-monitoring-config config and the user-workload-monitoring-config configmap allow to configure the retention size for
- Prometheus (Platform and UWM)
- Thanos Ruler (to be confirmed)
- Proper validation is in place preventing bad user inputs from breaking the stack.
Dependencies (internal and external)
- Thanos ruler doesn't support retention size (only retention time).
Previous Work (Optional):
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is documented by
RHDEVDOCS-3919 Document size-based retention config for metrics
- links to
|QE Tracker||Closed||Hongyan Li|
|TE Tracker||Closed||Eric Rich|
|Docs Tracker||Closed||Brian Burt|