Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- alerting-runbook
- mco

Story Points:
5
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
Actionable Error Messaging
Release Note Type:
Enhancement
Intelligence Requested:
Market:

Cost of Delay:
0
WSJF:
0.000

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

MCO will send an alert when a node for 15 minutes, a specific node is using more memory than is reserved.

The alerts describes the following

"summary: "Alerts the user when, for 15 minutes, a specific node is using more memory than is reserved"
description: "System memory usage of {{ $value | humanize }} on {{ $labels.node }} exceeds 95% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The default reservation is expected to be sufficient for most configurations and should be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods (either due to rate of change or at steady state).""

It is possible that admin may not be able to interpret exact action to be taken after looking at the alert and the cluster state. Adding runbook (https://github.com/openshift/runbooks) can help admin in better troubleshooting and taking appropriate action.

Acceptance Criteria:

Runbook doc is created for SystemMemoryExceedsReservation alert
Created runbook link is accessible to cluster admin with SystemMemoryExceedsReservation alert

relates to

MCO-427 Add missing runbooks for Prometheus rules

To Do

links to

openshift/machine-config-operator#4832: MCO-1492: Add new runbook for SystemMemoryExceedsReservation to alert

openshift/runbooks#230: MCO-1492: Add runbook SystemMemoryExceedsReservation

Assignee:: Courtney Ruhm

Reporter:: Courtney Ruhm

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/01/09 1:05 AM

Updated:: 2025/02/10 7:58 PM

Resolved:: 2025/02/10 7:58 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates