Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7603

before-acquire-handler does not run at other site after ticket release [RHEL 9]

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhel-8.1.0, rhel-9.4
    • booth
    • None
    • Important
    • sst_high_availability
    • ssg_filesystems_storage_and_HA
    • 13
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None

      Description of problem:

      Assume that a ticket has a before-acquire-handler that checks some external condition to determine whether a site should be allowed to acquire a ticket. The handler succeeds at ticket grant time, so the ticket is granted (say, to site 1).

      Then later, when it's time to renew the ticket (after renewal-freq), the handler runs again and fails. The ticket is released, and site 2 can try to acquire it.

      When site 2 tries to acquire the released ticket, it does NOT run the before-acquire-handler. Thus, it may acquire a ticket when it is not safe to do so, because the check does not execute. Only after the renewal-freq duration passes at site 2 will the before-acquire-handler run.

      In the demonstration below, I (like the customer who reported this) use a geostore attribute to mark whether a site is allowed to acquire a ticket.

      DEMONSTRATION:

      Booth environment:

      Site 1 (192.168.22.71):
      fastvm-rhel-8-0-23
      fastvm-rhel-8-0-24

      Site 2 (192.168.22.81):
      fastvm-rhel-8-0-33
      fastvm-rhel-8-0-34

      Arbitrator (192.168.22.52):
      fastvm-rhel-8-0-52

      1. # ticket configuration
        [root@fastvm-rhel-8-0-23 ~]# cat /etc/booth/booth.conf
        ~~~
        ...
        ticket = "apacheticket"
        expire = 120
        renewal-freq = 60
        retries = 4
        timeout = 10
        before-acquire-handler = /tmp/before-acquire-handler.sh
        ~~~
      1. # before-acquire-handler at site 1
      2. # site 2 is the same but uses booth_ip=192.168.22.81
      3. # Exit 0 if SAFE_TO_ACTIVATE geostore attribute is set to 1, else exit 1
        [root@fastvm-rhel-8-0-23 ~]# cat /tmp/before-acquire-handler.sh
        ~~~
        #!/bin/bash

      begin_date=$(date)
      h_name=$(hostname -s)
      ticket_name=apacheticket
      booth_ip=192.168.22.71

      SAFE_TO_ACTIVATE=$(geostore get -t "$ticket_name" -s "$booth_ip" SAFE_TO_ACTIVATE)

      echo "$begin_date - $h_name: Floating Cluster IP=$booth_ip, Ticket Name=$ticket_name, ACTIVATE=$SAFE_TO_ACTIVATE" >> /tmp/before-acquire-handler.log

      [[ $SAFE_TO_ACTIVATE -eq 1 ]] && exit 0 || exit 1
      ~~~

      1. # Initially, SAFE_TO_ACTIVATE=0 at both sites
        [root@fastvm-rhel-8-0-23 ~]# geostore get -t apacheticket -s 192.168.22.71 SAFE_TO_ACTIVATE
        0
        [root@fastvm-rhel-8-0-23 ~]# geostore get -t apacheticket -s 192.168.22.81 SAFE_TO_ACTIVATE
        0
      1. # Grant fails as it should because SAFE_TO_ACTIVATE != 1
        [root@fastvm-rhel-8-0-23 ~]# pcs booth ticket grant apacheticket 192.168.22.71
        Error: unable to grant booth ticket 'apacheticket' for site '192.168.22.71', reason: Jan 10 20:05:57 fastvm-rhel-8-0-23 booth: [4856]: info: grant request sent, waiting for the result ...
        Jan 10 20:05:57 fastvm-rhel-8-0-23 booth: [4856]: error: before-acquire-handler for ticket "apacheticket" failed, grant denied
      1. # Set SAFE_TO_ACTIVATE=1 at site 1
        [root@fastvm-rhel-8-0-23 ~]# geostore set -t apacheticket -s 192.168.22.71 SAFE_TO_ACTIVATE 1
        Jan 10 20:06:49 fastvm-rhel-8-0-23 geostore: [5278]: info: set succeeded!
      1. # Grant succeeds as it should
        [root@fastvm-rhel-8-0-23 ~]# pcs booth ticket grant apacheticket 192.168.22.71
        [root@fastvm-rhel-8-0-23 ~]# booth list
        ticket: apacheticket, leader: 192.168.22.71, expires: 2020-01-10 20:09:38
      1. # Set SAFE_TO_ACTIVATE=0 so that the before-acquire-handler fails upon ticket renewal
        [root@fastvm-rhel-8-0-23 ~]# geostore set -t apacheticket -s 192.168.22.71 SAFE_TO_ACTIVATE 0
        Jan 10 20:07:56 fastvm-rhel-8-0-23 geostore: [5810]: info: set succeeded!
      1. # Watch it fail at renewal
        [root@fastvm-rhel-8-0-23 ~]# tail -f /var/log/messages
        ...
        Jan 10 20:08:38 fastvm-rhel-8-0-23 boothd-site[31555]: [warning] apacheticket (Lead/19/59806): handler "/tmp/before-acquire-handler.sh" failed: exit code 1
        Jan 10 20:08:38 fastvm-rhel-8-0-23 boothd-site[31555]: [warning] apacheticket (Lead/19/59806): we are not allowed to acquire ticket
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6181]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6181]: notice: Invoked: crm_ticket -t apacheticket -S owner -v 0
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: State transition S_IDLE -> S_POLICY_ENGINE
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6200]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6200]: notice: Invoked: crm_ticket -t apacheticket -S expires -v 1578715718
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: Transition 303 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-351.bz2): Complete
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: State transition S_IDLE -> S_POLICY_ENGINE
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-schedulerd[11241]: notice: Calculated transition 303, saving inputs in /var/lib/pacemaker/pengine/pe-input-351.bz2
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6210]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6210]: notice: Invoked: crm_ticket -t apacheticket -S term -v 19
        Jan 10 20:08:38 fastvm-rhel-8-0-23 boothd-site[31555]: [info] setting crm_ticket attributes successful
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-schedulerd[11241]: notice: Calculated transition 304, saving inputs in /var/lib/pacemaker/pengine/pe-input-352.bz2
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6217]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log
        Jan 10 20:08:38 fastvm-rhel-8-0-23 crm_ticket[6217]: notice: Invoked: crm_ticket -t apacheticket -r --force
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: Transition 304 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-352.bz2): Complete
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-controld[11242]: notice: State transition S_IDLE -> S_POLICY_ENGINE
        Jan 10 20:08:38 fastvm-rhel-8-0-23 pacemaker-schedulerd[11241]: notice: * Stop dummy1 ( fastvm-rhel-8-0-23 ) due to node availability
      1. # Site 2 acquires the ticket without running before-acquire-handler
        Jan 10 20:08:37 fastvm-rhel-8-0-33 boothd-site[20210]: [info] apacheticket (Fllw/19/58738): 192.168.22.71 wants to give the ticket away (ticket release)
        Jan 10 20:08:37 fastvm-rhel-8-0-33 boothd-site[20210]: [info] setting crm_ticket attributes successful
        Jan 10 20:08:38 fastvm-rhel-8-0-33 boothd-site[20210]: [info] apacheticket (Fllw/20/0): starting new election (term=20)
        Jan 10 20:08:38 fastvm-rhel-8-0-33 boothd-site[20210]: [info] apacheticket (Lead/20/119999): granted successfully here
        Jan 10 20:08:38 fastvm-rhel-8-0-33 boothd-site[20210]: [info] setting crm_ticket attributes successful
        Jan 10 20:08:38 fastvm-rhel-8-0-33 pacemaker-controld[1365]: notice: Result of start operation for dummy1 on fastvm-rhel-8-0-33: 0 (ok)
      1. # Site 2 fails before-acquire-handler upon ticket renewal
        Jan 10 20:09:38 fastvm-rhel-8-0-33 boothd-site[20210]: [warning] apacheticket (Lead/20/59808): handler "/tmp/before-acquire-handler.sh" failed: exit code 1
        Jan 10 20:09:38 fastvm-rhel-8-0-33 boothd-site[20210]: [warning] apacheticket (Lead/20/59807): we are not allowed to acquire ticket
        Jan 10 20:09:38 fastvm-rhel-8-0-33 boothd-site[20210]: [info] setting crm_ticket attributes successful
        Jan 10 20:09:38 fastvm-rhel-8-0-33 pacemaker-controld[1365]: notice: Result of stop operation for dummy1 on fastvm-rhel-8-0-33: 0 (ok)
        Jan 10 20:09:39 fastvm-rhel-8-0-33 boothd-site[20210]: [info] setting crm_ticket attributes successful

      Version-Release number of selected component (if applicable):

      booth-core-1.0-5.f2d38ce.git.el8.x86_64
      booth-site-1.0-5.f2d38ce.git.el8.noarch
      pacemaker-2.0.2-3.el8_1.2.x86_64


      How reproducible:

      Always


      Steps to Reproduce:
      1. Configure a booth ticket with a before-acquire-handler. Ensure that the handler is set up to succeed at site 1 and to fail at site 2.
      2. Grant the ticket to site 1.
      3. Cause the before-acquire-handler to fail at ticket renewal time at site 1.
      4. Observe site 1 release the ticket and site 2 try to acquire it.


      Actual results:

      Site 2 does not run the before-acquire-handler and successfully acquires the released ticket.


      Expected results:

      Site 2 runs the before-acquire-handler and is not allowed to acquire the ticket.


      Additional info:

      I noticed this closed issue is from the customer who also reported it via a Red Hat support case: https://github.com/ClusterLabs/booth/issues/79.

            rhn-support-clumens Christopher Lumens
            rhn-support-nwahl Reid Wahl
            Christopher Lumens Christopher Lumens
            Cluster QE Cluster QE
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: