Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: rhel-10.0
Component/s: booth
Labels:
None

Regression:
None
Severity:
None

AssignedTeam:
rhel-ha

Story Points:
5
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Experience:
Architecture:

x86_64

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

SCENARIO
cluster1 running on one site, cluster2 running on the second site, arbitrator running on a third site. Booth ticket granted to cluster2.

ISSUE DESCRIPTION
After a reset of cluster2, simulating a disaster, booth on the Leader starts without ticket state. It tries to load it from pcmk using 'crm_ticket' and fails with exit code 105. It looks like pcmk is blocking the connection for a certain time after pcmk startup.

Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [debug] reset_ticket:531: apacheticket (Init/0/0): next state reset
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [error] apacheticket (Init/0/0): crm_ticket xml output empty
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [warning] apacheticket: no site matches; site got reconfigured?
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [error] command "crm_ticket -t 'apacheticket' -q" exit code 105
Oct 21 15:29:01 cluster2-node1.example.com boothd-site[1877]: [info] apacheticket (Init/0/0): broadcasting state query

Booth is able to get the ticket information from another Follower booth instance, but only if at least one booth instance "survives". If all instances are reset, the ticket is lost.

Booth is able to get the ticket information from pcmk if only the daemon is restarted on a running system (instead of a restart of the whole node):

[root@cluster2-node1 ~]# ps aux | grep booth
haclust+ 1922 0.0 0.4 15932 15676 ? SLs 17:18 0:00 boothd daemon -c /etc/booth/booth.conf
root 16865 0.0 0.0 6380 2016 pts/0 S+ 17:54 0:00 grep --color=auto booth
[root@cluster2-node1 ~]# killall -9 boothd
[root@cluster2-node1 ~]# ps aux | grep booth
haclust+ 17069 0.0 0.4 16060 15920 ? SLs 17:54 0:00 boothd daemon -c /etc/booth/booth.conf
root 18503 0.0 0.0 6380 2064 pts/0 S+ 17:57 0:00 grep --color=auto booth

Oct 21 17:54:27 cluster2-node1.example.com booth[17068]: [debug] read key of size 64 in authfile /etc/booth/booth.key
Oct 21 17:54:27 cluster2-node1.example.com booth[17068]: [debug] found myself at 192.168.100.250 (32 bits matched)
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [info] BOOTH site 1.2 daemon is starting
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] disown_ticket:509: apacheticket (/0/0): ticket leader set to NONE
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] reset_ticket:530: apacheticket (/0/0): state transition: -> Init
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] reset_ticket:531: apacheticket (Init/0/0): next state reset
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] command "crm_ticket -t 'apacheticket' -q"
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [debug] update_ticket_state:606: apacheticket (Init/0/195899): next state set to Lead
Oct 21 17:54:27 cluster2-node1.example.com boothd-site[17069]: [info] apacheticket (Init/0/195899): broadcasting state query

Please provide the package NVR for which the bug is seen:

booth-core-1.2-3.el10.x86_64
booth-site-1.2-3.el10.noarch
pcs-0.12.0-3.el10_0.2.x86_64
pacemaker-schemas-3.0.0-5.1.el10_0.noarch
pacemaker-libs-3.0.0-5.1.el10_0.x86_64
pacemaker-cluster-libs-3.0.0-5.1.el10_0.x86_64
pacemaker-3.0.0-5.1.el10_0.x86_64
pacemaker-cli-3.0.0-5.1.el10_0.x86_64

How reproducible is this bug?:

Reproducible

Steps to reproduce

Reset of cluster2 with cluster1 and arbitrator always up. Result: kept granted ticket (cluster1 and arbitrator can still provide the info)
Reset of cluster2 with cluster1 off and arbitrator always up. Result: kept granted ticket (arbitrator can still provide the info)
Poweroff/on of cluster2 with cluster1 off, arbitrator reset. Result: ticket lost.
Poweroff/on of cluster2 with cluster1 on, arbitrator reset. Result: kept granted ticket (cluster1 can still provide the info)
Poweroff/on of cluster2 with cluster1 off, arbitrator off, then in order cluster1 on, cluster2 on. Result: ticket lost.
Expected results

Booth is able to get the ticket information from pcmk even after a reset of all booth instances.

Actual results

Booth is able to get the ticket information from another Follower booth instance only if at least one booth instance "survives". If all instances are reset, the ticket is lost.

Assignee:: Reid Wahl

Reporter:: Riccardo Furlan

Developer:: Christopher Lumens

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/10/22 7:34 AM

Updated:: 2025/11/17 10:48 PM

Stale Date:: 2026/10/21

Details

Description

What were you trying to do that didn't work?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Attachments

Easy Agile Planning Poker

Activity

People

Dates