-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
rhel-9.4
-
None
-
No
-
None
-
rhel-ha
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
What were you trying to do that didn't work? A cluster consists of 4 hosts: svtm501, svtm502, svtm503, svtm504. svtm592 was the DC. During the rebooting of host svtm504, Pacemaker was unexpectedly shutdown on another host svtm503.
- User trigger a reboot of host svtm504 around Mar 11 04:24:17
- From the DC (svtm502 Pacemaker log) At Mar 11 04:24:17, host svtm504 was detected down
Mar 11 04:24:17.215 svtm502 pacemaker-controld [30919] (pcmk__update_peer_expected) info: handle_request: Node svtm504[4] - expected state is now down (was member)
Mar 11 04:24:17.215 svtm502 pacemaker-controld [30919] (handle_shutdown_request) info: Creating shutdown request for svtm504 (state=S_TRANSITION_ENGINE)
- At Mar 11 04:18: somehow node svtm503 was detected as shutting down. THIS IS UNEXPECTED.
Mar 11 04:24:18.271 svtm502 pacemaker-schedulerd[30918] (determine_online_status) info: svtm503 is shutting down
- One peculiar thing that was noted was that from svtm501 Pacemaker log file, the shutdown attribute failed to be set earlier when svtm503 was reboot:
Mar 11 03:39:00.227 svtm501 pacemaker-attrd [24774] (write_attribute) notice: Cannot update shutdown[svtm503]='1741678740' now because node's UUID is unknown (will retry if learned)
- And from svtm501 host, this shutdown attribute was later set at Mar 11 04:24:17.230, around the same time that Pacemaker is shutting down on node svtm503
Mar 11 04:24:17.230 svtm501 pacemaker-attrd [24774] (attrd_cib_callback) info: CIB update 4156 result for shutdown: OK | rc=0
Mar 11 04:24:17.230 svtm501 pacemaker-attrd [24774] (attrd_cib_callback) info: * Wrote shutdown[svtm504]=1741681457
Mar 11 04:24:17.230 svtm501 pacemaker-attrd [24774] (attrd_cib_callback) info: * Wrote shutdown[svtm503]=1741678740
I wondering whether the failed attempt/delay writing of the shutdown attribute for node svtm503 was causing the unexpected Pacemaker shutdown on svtm503. If not, what else could have caused it ?
We notice this started to occur in 2.1.9-1.
What is the impact of this issue to you? Unexpected cluster outage when a node is rebooted.
Please provide the package NVR for which the bug is seen:
Pacemaker 2.1.9-1
How reproducible is this bug?: So far only saw this once.
Steps to reproduce
- Set up cluster of 4 nodes
- Reboot a node. Wait for it to recover successfully after the host comes back online.
- Reboot another node. Pacemaker is shutdown on the previous rebooted node. So far, only saw this once.
Expected results: No unexpected Pacemaker shutdown on another node when rebooting of one node.
Actual results: Pacemaker was shut down on another node.
- impacts account
-
RHEL-23082 Avoid "shutdown" node attribute persisting after shutdown [rhel-10]
-
- In Progress
-