XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Normal
Fix Version/s: openshift-4.11
Affects Version/s: None
Component/s: cluster-monitoring-operator
Labels:
None

Epic Name:
SNO scape intervals
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Docs QE Status:
NEW
Epic Status:
To Do
Flagged:

Impediment
QE Status:
NEW
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Target Version:

openshift-4.11

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Epic Goal

Offer the option to double the scrape intervals for CMO controlled ServiceMonitors in single node deployments
Alternatively automatically double the same scrape intervals if CMO detects an SNO setup

The potential target ServiceMonitors are:

kubelet
kube-state-metrics
node-exporter
etcd
openshift-state-metrics

Why is this important?

Reduce CPU usage in SNO setups
Specifically doubling the scrape interval is important because:

we are confident that this will have the least chance to interfere with existing rules. We typically have rate queries over the last 2 minutes (no shorter time window). With 30 second scrape intervals (the current default) this gives us 4 samples in any 2 minute window. rate needs at least 2 samples to work, we want another 2 for failure tolerance. Doubling the scrape interval will still give us 2 samples in most 2 minute windows. If a scrape fails, a few rule evaluations might fail intermittently.
We expect a measureable reduction of CPU resources (see previous work)

Scenarios

RAN deployments (Telco Edge) are SNO deployments. In these setups a full CMO deployment is often not needed and the default setup consumes too many resources. OpenShift as a whole has only very limited CPU cycles available and too many cycles are spend on Monitoring

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Previous Work (Optional):

https://issues.redhat.com/browse/MON-1569

Open questions:

Whether doubling some scrape intervals reduces CPU usage to fit into the assigned budget

Non goals

Allow arbitrarily long scrape intervals. This will interfere with alert and recoring rules
Implement a global override to scrape intervals.

links to

openshift/openshift-docs#43249: OCP 4.11 Release Notes Tracker

QE Tracker

Closed

Unassigned

Assignee:: Jan Fajerski

Reporter:: Jan Fajerski

QA Contact:: Tai Gao

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2022/04/12 1:19 PM

Updated:: 2023/04/24 12:36 PM

Resolved:: 2022/07/28 12:01 PM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Previous Work (Optional):

Open questions:

Non goals

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates