Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-2991

Reduce overhead due to Prometheus and node-exporter


    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request

      Reduce overhead due to Prometheus and node-exporter

      2. What is the nature and description of the request?

      One of our Telco partner needs to have more flexibility about how the monitoring stack works, in order the reduce the CPU/Memory consumption:

      • To change the scrap frequency.
      • To decide which sensors/devices are analyzed.
      • To disable monitoring stack, or at least, on workers

      Bigger clusters/servers more consumption. So, the main concern happens on Multi Node Baremetal clusters. But SNO are also affected.

      3. Why does the customer need this? (List the business requirements here)

      Telco, and specially RAN, have very special requirements about performance and resources consumption. The RAN Profile already contains different optimizations focused on CPU utilization like the PAO, accelerated booting, disable some systemd services, and other specific optimizations. But monitoring stack consumes some resources that our partners would use for their workloads.

      From their perspective, the metrics gathered are not need it during their main activities. Or at least, many of the metrics are not necessary for them. Or, they could have their own tools to monitor only the metrics they need in their daily activities. So, they would like more flexibility about how the stack works/consumes for better optimization. More CPU/Memory would be used to run more workloads.

      The optimization seems more need on baremetal, bigger dedicated servers would contain more hardware/devices/sensors/cpus and the number of these, to be gather, is higher.

      The optimization seems more need on multi node clusters. When in principle, we have been focused on SNO, this seems not so problematic.  Maybe the gathered information is less intensive. But it would be also, because SNOs use newer OCP versions (4.9, 4.10). Their multinode clusters are on 4.6,4.7, 4.8.  In any case, it does not mean, they dont want flexibility on SNOs.

      4. List any affected packages or components.

      Mainly node_exporter and prometheus. But also kubelet because of the cAdvisor.

            rh-ee-rfloren Roger Florén
            jgato@redhat.com Jose Gato Luis
            0 Vote for this issue
            8 Start watching this issue