What were you trying to do that didn't work?
I tried to remove the stonith devices and stop the cluster, so I could setup sbd.
Please provide the package NVR for which bug is seen:
since pacemaker-2.1.6-7.el9.x86_64
How reproducible:
Sometimes, 50% chance
Steps to reproduce
- setup two node cluster
- check out which node is a DC
- on a DC node: remove the stonith devices and stop the cluster (
pcs stonith delete fence-virt-252; pcs stonith delete fence-virt-253; pcs cluster stop --all
)
Expected results
Stonith devices are deleted, cluster stops.
Actual results
Cluster is stuck while stopping:
[root@virt-253 ~]# pcs stonith delete fence-virt-252; pcs stonith delete fence-virt-253; pcs cluster stop --all
Attempting to stop: fence-virt-252... Stopped
Attempting to stop: fence-virt-253... Stopped
virt-252: Stopping Cluster (pacemaker)...
[root@virt-253 ~]# pcs status --full
Cluster name: STSRHTS14392
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync (Pacemaker daemons are shutting down)
* Current DC: virt-253 (2) (version 2.1.6-9.el9-6fdc9deea29) - MIXED-VERSION partition with quorum
* Last updated: Fri Oct 13 13:16:22 2023 on virt-253
* Last change: Fri Oct 13 13:15:18 2023 by root via cibadmin on virt-252
* 2 nodes configured
* 0 resource instances configured
Node List:
* Node virt-252 (1): pending, feature set <3.15.1
* Node virt-253 (2): online, feature set 3.17.4
Full List of Resources:
* No resources
Migration Summary:
Tickets:
PCSD Status:
virt-252: Online
virt-253: Online
Daemon Status:
corosync: active/enabled
pacemaker: inactive/enabled
pcsd: active/enabled
After 15 minutes when cluster is stuck (`cluster-recheck-interval` I assume) cluster finally stops.
I created a crm_report from the incident and attached it. The cluster got stuck on the stop action around Oct 13 13:15
[^cluster-froze-when-stop.tar.bz2]
- clones
-
RHEL-13216 Revert broken attempt to fix "cluster got stuck while stopping" [rhel-9]
- Closed