Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1492

Add Runbook for SystemMemoryExceedsReservation

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • 5
    • False
    • None
    • False
    • OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
    • Enhancement
    • 0
    • 0.000

      MCO will send an alert when a node  for 15 minutes, a specific node is using more memory than is reserved.

      The alerts describes the following 

      "summary: "Alerts the user when, for 15 minutes, a specific node is using more memory than is reserved"
                  description: "System memory usage of {{ $value | humanize }} on {{ $labels.node }} exceeds 95% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The default reservation is expected to be sufficient for most configurations and should be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods (either due to rate of change or at steady state).""

      It is possible that admin may not be able to interpret exact action to be taken after looking at the alert and the cluster state. Adding runbook (https://github.com/openshift/runbooks) can help admin in better troubleshooting and taking appropriate action.

       

      Acceptance Criteria:

      • Runbook doc is created for SystemMemoryExceedsReservation alert
      • Created runbook link is accessible to cluster admin with SystemMemoryExceedsReservation alert

       

              rhn-support-cruhm Courtney Ruhm
              rhn-support-cruhm Courtney Ruhm
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: