[RHEL-29861] The "pcmk_monitor_timeout" default value in multiple documentation is listed as 60s, but should be 20s - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: rhel-9.4
Component/s: pacemaker
Labels:
None

Regression:
None
Severity:
None

Pool Team:

rhel-sst-high-availability

Story Points:
3
ACKs Check:

Dev ack
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
Yes
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Release Note Type:
Bug Fix
Release Note Text:

Hide
Cause (the user action or circumstances that trigger the bug):
Consequence (what the user experience is when the bug occurs):
Fix (what has changed to fix the bug; do not include overly technical details):
Result (what happens now that the patch is applied):

Show
Cause (the user action or circumstances that trigger the bug): Consequence (what the user experience is when the bug occurs): Fix (what has changed to fix the bug; do not include overly technical details): Result (what happens now that the patch is applied):
Release Note Status:
Proposed

Experience:
Architecture:

All

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

We have identified that the `pcmk_monitor_timeout` default value for stonith devices reports a default which is not accurate in all of our documentation, and man pages. The default is listed as 60s ( based on `stonith-timeout`, but since `pcmk_monitor_timeout` isn't actually applied unless explicitly set, this value would not be very accurate. The actual monitor timeout by default would be 20s, so we should update this in documentation and man pages ( upstream and in RHEL ):

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-fencedevicesadditional-haar
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configu[...]-fencing-configuring-and-managing-high-availability-clusters
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configu[...]-fencing-configuring-and-managing-high-availability-clusters

$ man pacemaker-fenced
--------------------------------8<----------------------------- 
pcmk_monitor_timeout = time [60s]
    Advanced use only: Specify an alternate timeout to use for monitor
    actions instead of stonith-timeout
    Some devices need much more/less time to complete than normal.
    Use this to specify an alternate, device-specific, timeout 
    for 'monitor' actions.

Discussion in Slack around issue:

https://redhat-internal.slack.com/archives/C04HH4AJYH4/p1710789736264799

After discussion with Kgalliot and engineering, below are the tasks we wish to complete with this bug:

(1) figure out how the fencing monitor timeouts currently work

(2) decide and implement how they should be defined and used

(3) update the upstream documentation appropriately. They are also in the pacemaker-fenced man page, which would need updates as well.

(4) update the RHEL documentation.

For official documentation updates, we have the below DOC request opened:

[RHELDOCS-17816] Update documentation for pcmk_monitor_timeout
https://issues.redhat.com/browse/RHELDOCS-17816

This issue would additionally be an extension of issues being reviewed in below BUG:

RHEL-14826 A stop action for a stonith device timed out leading to a cluster node
being fenced
https://issues.redhat.com/browse/RHEL-14826

is related to

RHEL-14826 A stop action for a stonith device timed out leading to a cluster node being fenced

In Progress

links to

ClusterLabs T785

Assignee:: Reid Wahl

Reporter:: Joshua Baker

Developer:: Kenneth Gaillot (Inactive)

QA Contact:: Jana Rehova

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/03/20 9:04 PM

Updated:: 2025/04/24 2:09 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide