-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
SNO scape intervals
-
False
-
None
-
False
-
Not Selected
-
NEW
-
To Do
-
Impediment
-
NEW
-
0% To Do, 0% In Progress, 100% Done
Epic Goal
- Offer the option to double the scrape intervals for CMO controlled ServiceMonitors in single node deployments
- Alternatively automatically double the same scrape intervals if CMO detects an SNO setup
The potential target ServiceMonitors are:
- kubelet
- kube-state-metrics
- node-exporter
- etcd
- openshift-state-metrics
Why is this important?
- Reduce CPU usage in SNO setups
- Specifically doubling the scrape interval is important because:
- we are confident that this will have the least chance to interfere with existing rules. We typically have rate queries over the last 2 minutes (no shorter time window). With 30 second scrape intervals (the current default) this gives us 4 samples in any 2 minute window. rate needs at least 2 samples to work, we want another 2 for failure tolerance. Doubling the scrape interval will still give us 2 samples in most 2 minute windows. If a scrape fails, a few rule evaluations might fail intermittently.
- We expect a measureable reduction of CPU resources (see previous work)
Scenarios
- RAN deployments (Telco Edge) are SNO deployments. In these setups a full CMO deployment is often not needed and the default setup consumes too many resources. OpenShift as a whole has only very limited CPU cycles available and too many cycles are spend on Monitoring
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Previous Work (Optional):
Open questions:
- Whether doubling some scrape intervals reduces CPU usage to fit into the assigned budget
Non goals
- Allow arbitrarily long scrape intervals. This will interfere with alert and recoring rules
- Implement a global override to scrape intervals.
1.
|
QE Tracker | Closed | Unassigned |