Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- alerting-runbook
- mco

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Actionable Error Messaging
Release Note Type:
Enhancement
Intelligence Requested:
Market:

WSJF:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

MCO will send an alert when a node for 45 minutes, when a control plane node has extremely high memory usage

The alerts describes the following

"summary: >-
Extreme memory utilization per node within control plane nodes is extremely high, and could impact responsiveness and stability.
description: >-
The memory utilization per instance within control plane nodes influence the stability, and responsiveness of the cluster.
This can lead to cluster instability and slow responses from kube-apiserver or failing requests especially on etcd.
Moreover, OOM kill is expected which negatively influences the pod scheduling.
If this happens on container level, the descheduler will not be able to detect it, as it works on the pod level.
To fix this, increase memory of the affected node of control plane nodes."

It is possible that admin may not be able to interpret exact action to be taken after looking at the alert and the cluster state. Adding runbook (https://github.com/openshift/runbooks) can help admin in better troubleshooting and taking appropriate action.

Acceptance Criteria:

Runbook doc is created for ExtremelyHighIndividualControlPlaneMemory alert
Created runbook link is accessible to cluster admin with ExtremelyHighIndividualControlPlaneMemory alert

relates to

MCO-427 Add missing runbooks for Prometheus rules

Closed

links to

openshift/machine-config-operator#4976: MCO-1587: Add runbook for ExtremelyHighIndividualControlPlaneMemory

openshift/runbooks#242: MCO-1587: add runbook for ExtremelyHighIndividualControlPlaneMemory

Assignee:: Courtney Ruhm

Reporter:: Courtney Ruhm

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/03/06 11:06 PM

Updated:: 2025/04/08 10:49 PM

Resolved:: 2025/04/08 10:49 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates