Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-9.4
Component/s: pacemaker
Labels:
None

Regression:
No
Severity:
None
AssignedTeam:
rhel-ha

Story Points:
None
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work? A cluster consists of 4 hosts: svtm501, svtm502, svtm503, svtm504. svtm592 was the DC. During the rebooting of host svtm504, Pacemaker was unexpectedly shutdown on another host svtm503.

User trigger a reboot of host svtm504 around Mar 11 04:24:17

From the DC (svtm502 Pacemaker log) At Mar 11 04:24:17, host svtm504 was detected down

Mar 11 04:24:17.215 svtm502 pacemaker-controld [30919] (pcmk__update_peer_expected) info: handle_request: Node svtm504[4] - expected state is now down (was member)

Mar 11 04:24:17.215 svtm502 pacemaker-controld [30919] (handle_shutdown_request) info: Creating shutdown request for svtm504 (state=S_TRANSITION_ENGINE)

- At Mar 11 04:18: somehow node svtm503 was detected as shutting down. THIS IS UNEXPECTED.

Mar 11 04:24:18.271 svtm502 pacemaker-schedulerd[30918] (determine_online_status) info: svtm503 is shutting down

- One peculiar thing that was noted was that from svtm501 Pacemaker log file, the shutdown attribute failed to be set earlier when svtm503 was reboot:

Mar 11 03:39:00.227 svtm501 pacemaker-attrd [24774] (write_attribute) notice: Cannot update shutdown[svtm503]='1741678740' now because node's UUID is unknown (will retry if learned)

And from svtm501 host, this shutdown attribute was later set at Mar 11 04:24:17.230, around the same time that Pacemaker is shutting down on node svtm503

Mar 11 04:24:17.230 svtm501 pacemaker-attrd [24774] (attrd_cib_callback) info: CIB update 4156 result for shutdown: OK | rc=0
Mar 11 04:24:17.230 svtm501 pacemaker-attrd [24774] (attrd_cib_callback) info: * Wrote shutdown[svtm504]=1741681457
Mar 11 04:24:17.230 svtm501 pacemaker-attrd [24774] (attrd_cib_callback) info: * Wrote shutdown[svtm503]=1741678740

I wondering whether the failed attempt/delay writing of the shutdown attribute for node svtm503 was causing the unexpected Pacemaker shutdown on svtm503. If not, what else could have caused it ?

We notice this started to occur in 2.1.9-1.

What is the impact of this issue to you? Unexpected cluster outage when a node is rebooted.

Please provide the package NVR for which the bug is seen:

Pacemaker 2.1.9-1

How reproducible is this bug?: So far only saw this once.

Steps to reproduce

Set up cluster of 4 nodes
Reboot a node. Wait for it to recover successfully after the host comes back online.
Reboot another node. Pacemaker is shutdown on the previous rebooted node. So far, only saw this once.

Expected results: No unexpected Pacemaker shutdown on another node when rebooting of one node.

Actual results: Pacemaker was shut down on another node.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

unexpect-node-shutdown.tar.bz2
16.88 MB
2025/03/13 4:22 PM

impacts account

RHEL-23082 Avoid "shutdown" node attribute persisting after shutdown [rhel-10]

Release Pending

Assignee:: Christopher Lumens

Reporter:: Lan Pham

Contributing Groups:: IBM Confidential Group

Developer:: Christopher Lumens

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/03/13 4:19 PM

Updated:: 2025/11/25 5:23 PM

Resolved:: 2025/03/17 7:48 PM

Details

Description

What were you trying to do that didn't work? A cluster consists of 4 hosts: svtm501, svtm502, svtm503, svtm504. svtm592 was the DC. During the rebooting of host svtm504, Pacemaker was unexpectedly shutdown on another host svtm503.

- At Mar 11 04:18: somehow node svtm503 was detected as shutting down. THIS IS UNEXPECTED.

- One peculiar thing that was noted was that from svtm501 Pacemaker log file, the shutdown attribute failed to be set earlier when svtm503 was reboot:

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?: So far only saw this once.

Steps to reproduce

Expected results: No unexpected Pacemaker shutdown on another node when rebooting of one node.

Actual results: Pacemaker was shut down on another node.

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates