Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: openshift-4.15
Component/s: Monitoring
Labels:
- OOM
- memory
- metric
- monitor
- node
- prometheus

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Intelligence Requested:
Market:
PX Impact Score:
PX Priority Data:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

1. Proposed title of this feature request
Add an alert to recommend increasing memory on the OCP nodes running prometheus-k8s pods.

2. What is the nature and description of the request?

The OCP nodes running Prometheus-k8s pods become unstable when the prometheus pods request more memory than the node's total capacity.

On a proactive basis, an alert can be created to recommend the node sizing based on prometheus metric data.

The approx amount of memory needed for prometheus pods can be calculated by `prometheus_tsdb_head_series` and multiplied by 8k, as per below:

Needed Ram =  <value of prometheus_tsdb_head_series> * 8Kb

3. Why does the customer need this? (List the business requirements here)

The Prometheus-k8s pods start using more memory (without defining the limit) as the number of nodes or the number of pods increases. The bigger the cluster, the bigger the memory Prometheus pods will use.

The infra nodes (in most cases) running Prometheus-k8s pods go under memory pressure and the Prometheus-k8s pods get OOMKilled by the RHCOS node when the node is unable to provide the memory beyond the node capacity.

Implementing an alert would help customers to suggest the amount of RAM needed for the prometheus pod to run properly. Customers can react to the recommendation to decide if they want to increase the memory at the node level or if they are happy to sacrifice the monitoring data from the `/prometheus/wal/*` path to reclaim the memory usage from Prometheus pods.

4. List any affected packages or components.
monitoring

Assignee:: Roger Florén

Reporter:: Divyam Pateriya

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/04/23 4:00 PM

Updated:: 2024/12/02 1:29 PM

Resolved:: 2024/12/02 1:29 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates