-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-8.9.0
-
None
-
None
-
None
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
8
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
None
What were you trying to do that didn't work?
In 2 node cluster with qdevice server as a arbitrator when node1's network is blocked and recovered after token timeout but within the consensus time, both nodes lose quorum. Issue occurs no matter what qdevice algorithm is used (lms/fsplit).
Please provide the package NVR for which bug is seen:
corosynclib-3.1.7-1.el8.x86_64
corosync-3.1.7-1.el8.x86_64
corosync-qdevice-3.0.2-1.el8_8.1.x86_64
How reproducible:
Always
Steps to reproduce
Create cluster consisting of:
- node1
- node2
- qdevice
Set totem token to 20000ms (in order to trigger the issue more easily as it is heavily depending on timing so to have enough time to recover the network just in between totem expiring but consensus not expiring yet - so after 35s in this reproducer)
[root@c-rhel8-node1 ~]# date; logger -t ========== algorighm=ffsplit, sleep 35 ==========; ./net_breaker.sh BreakCommCmd node1; sleep 35; ./net_breaker.sh FixCommCmd node1;
(net_breaker.sh is from https://access.redhat.com/solutions/79523)
Expected results
No quorum loss on node2
Actual results
Both nodes lost quorum temporarily, and node2 was fenced.
- clones
-
RHEL-13711 Quorum lost when network recovers within consensus period (RHEL 9)
- Planning
- links to