[MCO-1537] Add runbook for MCDRebootError alert

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
- alerting-runbook
- mco

Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
Actionable Error Messaging
Intelligence Requested:
Market:

WSJF:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

MCC sends drain alert when a node fails to reboot in a span of 5 minutes This is to make sure that admin takes appropriate action if required by looking at the pod logs. Alert contains the information on where to look for the logs.

Example alert looks like:

Reboot failed on {{ $labels.node }} , update may be blocked. For more details: oc logs -f -n {{ $labels.namespace }} {{ $labels.pod }} -c machine-config-daemon

It is possible that admin may not be able to interpret exact action to be taken after looking at MCC pod logs. Adding runbook (https://github.com/openshift/runbooks) can help admin in better troubleshooting and taking appropriate action.

Acceptance Criteria:

Runbook doc is created for MCDRebootError alert
Created runbook link is accessible to cluster admin with MCDRebootError alert

relates to

MCO-427 Add missing runbooks for Prometheus rules

To Do

links to

openshift/machine-config-operator#4895: MCO-1537: Add MCDRebootError runbook to prometheus rules

openshift/runbooks#239: MCO-1537: Add runbook for MCDRebootError

There are no comments yet on this issue.

Assignee:: Courtney Ruhm

Reporter:: Courtney Ruhm

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/02/10 8:08 PM

Updated:: 2025/03/30 3:49 PM

Resolved:: 2025/03/06 10:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates