Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Monitoring
Labels:

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None
Portfolio Solutions:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Proposed title of this feature request

Reduce overhead due to Prometheus and node-exporter

2. What is the nature and description of the request?

One of our Telco partner needs to have more flexibility about how the monitoring stack works, in order the reduce the CPU/Memory consumption:

To change the scrap frequency.
To decide which sensors/devices are analyzed.
To disable monitoring stack, or at least, on workers

Bigger clusters/servers more consumption. So, the main concern happens on Multi Node Baremetal clusters. But SNO are also affected.

3. Why does the customer need this? (List the business requirements here)

Telco, and specially RAN, have very special requirements about performance and resources consumption. The RAN Profile already contains different optimizations focused on CPU utilization like the PAO, accelerated booting, disable some systemd services, and other specific optimizations. But monitoring stack consumes some resources that our partners would use for their workloads.

From their perspective, the metrics gathered are not need it during their main activities. Or at least, many of the metrics are not necessary for them. Or, they could have their own tools to monitor only the metrics they need in their daily activities. So, they would like more flexibility about how the stack works/consumes for better optimization. More CPU/Memory would be used to run more workloads.

The optimization seems more need on baremetal, bigger dedicated servers would contain more hardware/devices/sensors/cpus and the number of these, to be gather, is higher.

The optimization seems more need on multi node clusters. When in principle, we have been focused on SNO, this seems not so problematic. Maybe the gathered information is less intensive. But it would be also, because SNOs use newer OCP versions (4.9, 4.10). Their multinode clusters are on 4.6,4.7, 4.8. In any case, it does not mean, they dont want flexibility on SNOs.

4. List any affected packages or components.

Mainly node_exporter and prometheus. But also kubelet because of the cAdvisor.

relates to

OBSDA-209 Customizations for node-exporter collectors

Closed

OBSDA-211 Implement scrape profiles

Closed

Assignee:: Roger Florén

Reporter:: Jose Gato Luis

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2022/07/05 8:05 AM

Updated:: 2025/09/24 8:19 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates