-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.4
-
None
-
No
-
None
-
rhel-sst-high-availability
-
None
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
All
-
None
What were you trying to do that didn't work?
- Running Pacemaker domain and testing HA, killing resources to check automation.
- ```
{{Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: crit: pacemaker-schedulerd[162772] is unresponsive to IPC after 5 attempts and will now be killed
Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: notice: Stopping pacemaker-schedulerd
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Connection to pengine IPC failed
Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: warning: pacemaker-schedulerd[162772] terminated with signal 9 (Killed)
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Connection to pengine closed
Mar 15 05:18:27 p9dpflnx01 pacemakerd[162767]: notice: Respawning pacemaker-schedulerd subdaemon after unexpected exit
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: crit: Lost connection to the scheduler
Mar 15 05:18:27 p9dpflnx01 pacemaker-schedulerd[3948119]: notice: Starting Pacemaker scheduler
Mar 15 05:18:27 p9dpflnx01 pacemaker-schedulerd[3948119]: notice: Pacemaker scheduler successfully started and accepting connections
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Saved Cluster Information Base to /var/lib/pacemaker/pengine/pe-core-7890d349-3810-48ea-a0f0-ea44dbe2ad34.bz2 after scheduler crash
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Input I_ERROR received in state S_POLICY_ENGINE from save_cib_contents
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: State transition S_POLICY_ENGINE -> S_RECOVERY
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: Fast-tracking shutdown in response to errors
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: warning: Not voting in election, we're in state S_RECOVERY
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: Input I_TERMINATE received in state S_RECOVERY from do_recover
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Stopped 0 recurring operations at shutdown (11 remaining)
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Recurring action db2_partitionset_jstamko2_0_1_2:16819 (db2_partitionset_jstamko2_0_1_2_monitor_10000) incomplete at shutdown
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: error: 1 resource was active at shutdown
Mar 15 05:18:27 p9dpflnx01 pacemaker-controld[162773]: notice: Disconnected from Corosync
Mar 15 05:18:28 p9dpflnx01 pacemaker-controld[162773]: error: Could not recover from internal error
Mar 15 05:18:28 p9dpflnx01 pacemakerd[162767]: error: pacemaker-controld[162773] exited with status 1 (Error occurred)
Mar 15 05:18:28 p9dpflnx01 pacemakerd[162767]: notice: Respawning pacemaker-controld subdaemon after unexpected exit}}
```
I Want to know how to find the root cause of this failure and how to avoid it
What is the impact of this issue to you?
- There is an unexpected Pacemaker failure that could cause timing windows where for a short period of time automation is disabled due to Pacemaker process failure
Please provide the package NVR for which the bug is seen:
2.1.6-4
How reproducible is this bug?: Not reproducible easily need to wait until Pacemaker failure
Steps to reproduce
Expected results: Pacemaker stays up without error
Actual results: Pacemaker process gets killed by pacemakerd and gets restarted
How can I find any logs related to this? Both systemlog ( messages ) and Pacemaker log ( pacemaker.log ) just shows Pacemaker detected internal error and was killed