[RHEL-84524] Pacemaker process fails with internal error - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-9.4
Component/s: pacemaker
Labels:
None

Regression:
No
Severity:
None

Pool Team:

rhel-sst-high-availability

Story Points:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Architecture:

All

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

Running Pacemaker domain and testing HA, killing resources to check automation.
```

{{Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: crit: pacemaker-schedulerd[162772] is unresponsive to IPC after 5 attempts and will now be killed
Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: notice: Stopping pacemaker-schedulerd
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Connection to pengine IPC failed
Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: warning: pacemaker-schedulerd[162772] terminated with signal 9 (Killed)
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Connection to pengine closed
Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: notice: Respawning pacemaker-schedulerd subdaemon after unexpected exit
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: crit: Lost connection to the scheduler
Mar 15 05:18:27 p9dpflnx01 pacemaker-schedulerd[3948119]: notice: Starting Pacemaker scheduler
Mar 15 05:18:27 p9dpflnx01 pacemaker-schedulerd[3948119]: notice: Pacemaker scheduler successfully started and accepting connections
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Saved Cluster Information Base to /var/lib/pacemaker/pengine/pe-core-7890d349-3810-48ea-a0f0-ea44dbe2ad34.bz2 after scheduler crash
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Input I_ERROR received in state S_POLICY_ENGINE from save_cib_contents
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: State transition S_POLICY_ENGINE -> S_RECOVERY
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: Fast-tracking shutdown in response to errors
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: Not voting in election, we're in state S_RECOVERY
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Input I_TERMINATE received in state S_RECOVERY from do_recover
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Stopped 0 recurring operations at shutdown (11 remaining)
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Recurring action db2_partitionset_jstamko2_0_1_2:16819 (db2_partitionset_jstamko2_0_1_2_monitor_10000) incomplete at shutdown
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: 1 resource was active at shutdown
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Disconnected from Corosync
Mar 15 05:18:28 p9dpflnx01 pacemaker-controld[162773]: error: Could not recover from internal error
Mar 15 05:18:28 p9dpflnx01 pacemakerd[162767]: error: pacemaker-controld[162773] exited with status 1 (Error occurred)
Mar 15 05:18:28 p9dpflnx01 pacemakerd[162767]: notice: Respawning pacemaker-controld subdaemon after unexpected exit}}
```

I Want to know how to find the root cause of this failure and how to avoid it

What is the impact of this issue to you?

There is an unexpected Pacemaker failure that could cause timing windows where for a short period of time automation is disabled due to Pacemaker process failure
Please provide the package NVR for which the bug is seen:

2.1.6-4

How reproducible is this bug?: Not reproducible easily need to wait until Pacemaker failure

Steps to reproduce

Expected results: Pacemaker stays up without error

Actual results: Pacemaker process gets killed by pacemakerd and gets restarted

How can I find any logs related to this? Both systemlog ( messages ) and Pacemaker log ( pacemaker.log ) just shows Pacemaker detected internal error and was killed

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Larger_PCMK.log
60.93 MB
2025/04/02 1:57 PM
pacemaker.log
578 kB
2025/03/28 5:32 PM

Assignee:: Chris Feist

Reporter:: Dongho Han

Developer:: Chris Feist

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/03/21 8:26 PM

Updated:: 2025/04/14 2:44 PM

Stale Date:: 2026/04/06

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

2.1.6-4

How reproducible is this bug?: Not reproducible easily need to wait until Pacemaker failure

Steps to reproduce

Expected results: Pacemaker stays up without error

Actual results: Pacemaker process gets killed by pacemakerd and gets restarted

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide