-
Bug
-
Resolution: Done-Errata
-
Critical
-
rhel-9.2.0, rhel-9.3.0
-
pacemaker-2.1.7-4.el9
-
Yes
-
Important
-
ZStream, Regression
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
22
-
26
-
8
-
QE ack, Dev ack
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
Approved Blocker
-
Pass
-
RegressionOnly
-
-
All
-
All
-
2.1.7
-
None
What were you trying to do that didn't work?
I tried to remove the stonith devices and stop the cluster, so I could setup sbd.
Please provide the package NVR for which bug is seen:
since pacemaker-2.1.6-7.el9.x86_64
How reproducible:
Sometimes, 50% chance
Steps to reproduce
- setup two node cluster
- check out which node is a DC
- on a DC node: remove the stonith devices and stop the cluster (
pcs stonith delete fence-virt-252; pcs stonith delete fence-virt-253; pcs cluster stop --all
)
Expected results
Stonith devices are deleted, cluster stops.
Actual results
Cluster is stuck while stopping:
[root@virt-253 ~]# pcs stonith delete fence-virt-252; pcs stonith delete fence-virt-253; pcs cluster stop --all
Attempting to stop: fence-virt-252... Stopped
Attempting to stop: fence-virt-253... Stopped
virt-252: Stopping Cluster (pacemaker)...
[root@virt-253 ~]# pcs status --full
Cluster name: STSRHTS14392
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync (Pacemaker daemons are shutting down)
* Current DC: virt-253 (2) (version 2.1.6-9.el9-6fdc9deea29) - MIXED-VERSION partition with quorum
* Last updated: Fri Oct 13 13:16:22 2023 on virt-253
* Last change: Fri Oct 13 13:15:18 2023 by root via cibadmin on virt-252
* 2 nodes configured
* 0 resource instances configured
Node List:
* Node virt-252 (1): pending, feature set <3.15.1
* Node virt-253 (2): online, feature set 3.17.4
Full List of Resources:
* No resources
Migration Summary:
Tickets:
PCSD Status:
virt-252: Online
virt-253: Online
Daemon Status:
corosync: active/enabled
pacemaker: inactive/enabled
pcsd: active/enabled
After 15 minutes when cluster is stuck (`cluster-recheck-interval` I assume) cluster finally stops.
I created a crm_report from the incident and attached it. The cluster got stuck on the stop action around Oct 13 13:15
- is cloned by
-
RHEL-23082 Avoid "shutdown" node attribute persisting after shutdown
- In Progress
- links to
-
RHBA-2023:125612 pacemaker bug fix and enhancement update