Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-5452

Add an alert to recommend increasing memory on the OCP nodes running prometheus-k8s pods

XMLWordPrintable

    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request
      Add an alert to recommend increasing memory on the OCP nodes running prometheus-k8s pods.

      2. What is the nature and description of the request?

      The OCP nodes running Prometheus-k8s pods become unstable when the prometheus pods request more memory than the node's total capacity.

      On a proactive basis, an alert can be created to recommend the node sizing based on prometheus metric data.

      The approx amount of memory needed for prometheus pods can be calculated by `prometheus_tsdb_head_series`  and multiplied by 8k, as per below:

      Needed Ram =  <value of prometheus_tsdb_head_series> * 8Kb

       

      3. Why does the customer need this? (List the business requirements here)

      The Prometheus-k8s pods start using more memory (without defining the limit) as the number of nodes or the number of pods increases. The bigger the cluster, the bigger the memory Prometheus pods will use.

      The infra nodes (in most cases) running Prometheus-k8s pods go under memory pressure and the Prometheus-k8s pods get OOMKilled by the RHCOS node when the node is unable to provide the memory beyond the node capacity.

      Implementing an alert would help customers to suggest the amount of RAM needed for the prometheus pod to run properly. Customers can react to the recommendation to decide if they want to increase the memory at the node level or if they are happy to sacrifice the monitoring data from the `/prometheus/wal/*` path to reclaim the memory usage from Prometheus pods.

      4. List any affected packages or components.
      monitoring

              rh-ee-rfloren Roger Florén
              rhn-support-dpateriy Divyam Pateriya
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: