-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-10.0
-
None
-
No
-
None
-
rhel-sst-high-availability
-
8
-
False
-
-
None
-
None
-
None
-
None
-
None
What were you trying to do that didn't work?
When setting new remote node with target-role=Stopped, unusual behavior will happen:
- The cluster will fence the remote node (is this the expected behavior?).
- After wait for the fence action to complete, cluster fence the remote node again when trying to manually enable it.
- After the second fence action, enable can't be
In addition, when fencing is not set for the remote node, crm_resource --wait will stuck on pending fencing action (which for example make it impossible to remove the remote node).
Please provide the package NVR for which the bug is seen:
pacemaker-3.0.0-5.el10.x86_64
How reproducible is this bug?:
always
Steps to reproduce
[root@virt-537 ~]# pcs cluster node add-remote virt-541 meta target-role=Stopped No addresses specified for host 'virt-541', using 'virt-541' Sending 'pacemaker authkey' to 'virt-541' virt-541: successful distribution of the file 'pacemaker authkey' Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-541' virt-541: successful run of 'pacemaker_remote enable' virt-541: successful run of 'pacemaker_remote start' [root@virt-537 ~]# echo $? 0 [root@virt-537 ~]# pcs stonith history reboot of virt-541 successful: delegate=virt-536, client=pacemaker-controld.350889, origin=virt-537, completed='2025-01-28 15:19:49.962541 +01:00' 1 event found {wait for the fence to complete} [root@virt-537 ~]# pcs resource enable virt-541 {this will take some time to reach timeout} [root@virt-537 ~]# pcs status Cluster name: STSRHTS10555 Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: virt-537 (version 3.0.0-5.el10-5b53b7e) - partition with quorum * Last updated: Tue Jan 28 15:28:25 2025 on virt-537 * Last change: Tue Jan 28 15:26:21 2025 by root via root on virt-537 * 3 nodes configured * 4 resource instances configured Node List: * Online: [ virt-536 virt-537 ] * RemoteOFFLINE: [ virt-541 ] Full List of Resources: * fence-virt-536 (stonith:fence_xvm): Started virt-536 * fence-virt-537 (stonith:fence_xvm): Started virt-537 * fence-virt-541 (stonith:fence_xvm): Started virt-536 * virt-541 (ocf:pacemaker:remote): Stopped Failed Resource Actions: * virt-541 start on virt-537 could not be executed (Timed out: Connection refused without enough time to retry) at Tue Jan 28 15:27:20 2025 * virt-541 start on virt-536 could not be executed (Timed out: Connection refused without enough time to retry) at Tue Jan 28 15:26:21 2025 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@virt-537 ~]# pcs stonith history reboot of virt-541 successful: delegate=virt-536, client=pacemaker-controld.350889, origin=virt-537, completed='2025-01-28 15:28:21.468659 +01:00' reboot of virt-541 successful: delegate=virt-536, client=pacemaker-controld.350889, origin=virt-537, completed='2025-01-28 15:19:49.962541 +01:00' 2 events found
Additional info:
This Jira is created mainly to investigate this behavior, might be closed as notabug if the behavior is expected.