XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Critical
Fix Version/s: openshift-4.11
Affects Version/s: None
Component/s: cluster-monitoring-operator
Labels:
- osd-rosa
- service-delivery-prio-asks

Epic Name:
Support size-based retention for metrics
Blocked:
False
Ready:
False
Docs QE Status:
NEW
Epic Status:
To Do
Feature Link:
OBSDA-27 - Enable prometheus retention.size via CMO
Flagged:

Impediment
Parent Link:
OBSDA-27Enable prometheus retention.size via CMO
QE Status:
NEW
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Market:

Epic Goal

Cluster admins want to configure the retention size for their metrics.

Why is this important?

While it is possible to define how long metrics should be retained on disk, it's not possible to tell the cluster monitoring operator how much data it should keep. For OSD/ROSA in particular, it would facilitate the management of the fleet if the retention size could be configured based on the persistent volume size because it would avoid issues with the storage getting full and monitoring being down when too many metrics are produced.

Scenarios

As a cluster admin, I want to define the maximum amount of data to be retained on the persistent volume.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
The cluster-monitoring-config config and the user-workload-monitoring-config configmap allow to configure the retention size for
- Prometheus (Platform and UWM)
- Thanos Ruler (to be confirmed)
Proper validation is in place preventing bad user inputs from breaking the stack.

Dependencies (internal and external)

Thanos ruler doesn't support retention size (only retention time).

Previous Work (Optional):

None

Open questions::

None

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

is documented by

RHDEVDOCS-3919 Document size-based retention config for metrics

Closed

links to

openshift/cluster-monitoring-operator#1579: MON-2193: pkg/manifests: Expose retention size settings for Platform Prometheus

openshift/cluster-monitoring-operator#1630: MON-2193: pkg/manifests: Expose retention size settings for UWM Prometheus

openshift/openshift-docs#43249: OCP 4.11 Release Notes Tracker

There are no Sub-Tasks for this issue.

Assignee:: Jayapriya Pai

Reporter:: Simon Pasquier

QA Contact:: Hongyan Li

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2022/01/31 3:33 PM

Updated:: 2022/08/26 2:28 PM

Resolved:: 2022/07/28 8:57 AM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates