-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
5
-
False
-
None
-
False
-
OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
-
Enhancement
-
-
-
0
-
0.000
MCO will send an alert when a node for 15 minutes, a specific node is using more memory than is reserved.
The alerts describes the following
"summary: "Alerts the user when, for 15 minutes, a specific node is using more memory than is reserved"
description: "System memory usage of {{ $value | humanize }} on {{ $labels.node }} exceeds 95% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The default reservation is expected to be sufficient for most configurations and should be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods (either due to rate of change or at steady state).""
It is possible that admin may not be able to interpret exact action to be taken after looking at the alert and the cluster state. Adding runbook (https://github.com/openshift/runbooks) can help admin in better troubleshooting and taking appropriate action.
Acceptance Criteria:
- Runbook doc is created for SystemMemoryExceedsReservation alert
- Created runbook link is accessible to cluster admin with SystemMemoryExceedsReservation alert
- relates to
-
MCO-427 Add missing runbooks for Prometheus rules
- To Do
- links to