-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-10.0
-
None
-
None
-
None
-
rhel-ha
-
5
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
x86_64
-
None
What were you trying to do that didn't work?
SCENARIO
cluster1 running on one site, cluster2 running on the second site, arbitrator running on a third site. Booth ticket granted to cluster2.
ISSUE DESCRIPTION
After a reset of cluster2, simulating a disaster, booth on the Leader starts without ticket state. It tries to load it from pcmk using 'crm_ticket' and fails with exit code 105. It looks like pcmk is blocking the connection for a certain time after pcmk startup.
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [debug] reset_ticket:531: apacheticket (Init/0/0): next state reset
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [error] apacheticket (Init/0/0): crm_ticket xml output empty
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [warning] apacheticket: no site matches; site got reconfigured?
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [error] command "crm_ticket -t 'apacheticket' -q" exit code 105
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [info] apacheticket (Init/0/0): broadcasting state query
Booth is able to get the ticket information from another Follower booth instance, but only if at least one booth instance "survives". If all instances are reset, the ticket is lost.
Booth is able to get the ticket information from pcmk if only the daemon is restarted on a running system (instead of a restart of the whole node):
[root@cluster2-node1 ~]# ps aux | grep booth
haclust+ 1922 0.0 0.4 15932 15676 ? SLs 17:18 0:00 boothd daemon -c /etc/booth/booth.conf
root 16865 0.0 0.0 6380 2016 pts/0 S+ 17:54 0:00 grep --color=auto booth
[root@cluster2-node1 ~]# killall -9 boothd
[root@cluster2-node1 ~]# ps aux | grep booth
haclust+ 17069 0.0 0.4 16060 15920 ? SLs 17:54 0:00 boothd daemon -c /etc/booth/booth.conf
root 18503 0.0 0.0 6380 2064 pts/0 S+ 17:57 0:00 grep --color=auto booth
Oct 21 17:54:27 cluster2-node1.example.com booth[17068]: [debug] read key of size 64 in authfile /etc/booth/booth.key
Oct 21 17:54:27 cluster2-node1.example.com booth[17068]: [debug] found myself at 192.168.100.250 (32 bits matched)
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [info] BOOTH site 1.2 daemon is starting
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] disown_ticket:509: apacheticket (/0/0): ticket leader set to NONE
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] reset_ticket:530: apacheticket (/0/0): state transition: -> Init
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] reset_ticket:531: apacheticket (Init/0/0): next state reset
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] command "crm_ticket -t 'apacheticket' -q"
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] update_ticket_state:606: apacheticket (Init/0/195899): next state set to Lead
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [info] apacheticket (Init/0/195899): broadcasting state query
Please provide the package NVR for which the bug is seen:
booth-core-1.2-3.el10.x86_64
booth-site-1.2-3.el10.noarch
pcs-0.12.0-3.el10_0.2.x86_64
pacemaker-schemas-3.0.0-5.1.el10_0.noarch
pacemaker-libs-3.0.0-5.1.el10_0.x86_64
pacemaker-cluster-libs-3.0.0-5.1.el10_0.x86_64
pacemaker-3.0.0-5.1.el10_0.x86_64
pacemaker-cli-3.0.0-5.1.el10_0.x86_64
How reproducible is this bug?:
Reproducible
Steps to reproduce
- Reset of cluster2 with cluster1 and arbitrator always up. Result: kept granted ticket (cluster1 and arbitrator can still provide the info)
- Reset of cluster2 with cluster1 off and arbitrator always up. Result: kept granted ticket (arbitrator can still provide the info)
- Poweroff/on of cluster2 with cluster1 off, arbitrator reset. Result: ticket lost.
- Poweroff/on of cluster2 with cluster1 on, arbitrator reset. Result: kept granted ticket (cluster1 can still provide the info)
- Poweroff/on of cluster2 with cluster1 off, arbitrator off, then in order cluster1 on, cluster2 on. Result: ticket lost.
Expected results
Booth is able to get the ticket information from pcmk even after a reset of all booth instances.
Actual results
Booth is able to get the ticket information from another Follower booth instance only if at least one booth instance "survives". If all instances are reset, the ticket is lost.