-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
rhel-9.2.0
-
None
-
No
-
None
-
sst_high_availability
-
ssg_filesystems_storage_and_HA
-
None
-
False
-
-
None
-
None
-
None
-
None
-
ppc64le
-
None
What were you trying to do that didn't work?
In a cluster consists of 2 cluster nodes and 2 Pacemaker remote nodes, taking down the public network on one of the Pacemaker remote node caused the Pacemaker control daemon on the DC to hang and subsequently killed. This is not expected. The expected behaviour is the DC would detect the remote node as OFFLINE and perform fencing operation.
Please provide the package NVR for which bug is seen:
Pacemaker 2.1.6-4.db2pcmk.el9
How reproducible:
Hit the issue in the second attempt.
Steps to reproduce
- Create a cluster that consists of 2 cluster nodes and 2 Pacemaker remote nodes
- Run ifconfig <interface> down on the public interface on one of the 2 Pacemaker remote node
- Shortly after the interface is down, the DC node is OFFLINE and Pacemaker restarted on the DC host.
Expected results: The node would be detected as OFFLINE and the DC perform node recovery.
Actual results: The Pacemaker control daemon on the DC timed out with these errors, then terminated. The DC role then restarted on a different host:
Jul 17 15:59:49.331 p10rhel094 pacemakerd [2271] (pcmk__ipc_is_authentic_process_active) info: Could not connect to crmd IPC: timeout
Jul 17 15:59:49.331 p10rhel094 pacemakerd [2271] (check_next_subdaemon) notice: pacemaker-controld[2504] is unresponsive to ipc after 1 tries
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (pcmk__ipc_is_authentic_process_active) info: Could not connect to crmd IPC: timeout
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (check_next_subdaemon) error: pacemaker-controld[2504] is unresponsive to ipc after 5 tries but we found the pid so have it killed that we can restart
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (pcmk_child_exit) warning: pacemaker-controld[2504] terminated with signal 9 (Killed)
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (pcmk__ipc_is_authentic_process_active) info: Could not connect to crmd IPC: Connection refused
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (pcmk_process_exit) notice: Respawning pacemaker-controld subdaemon after unexpected exit
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (start_child) info: Using uid=189 and group=189 for process pacemaker-controld
Jul 17 16:00:17.331 p10rhel094 pacemakerd [2271] (start_child) info: Forked child 3602000 for process pacemaker-controld
- duplicates
-
RHEL-34276 Pacemaker remote resource migration failed
- In Progress