Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-104412

Booth Arbitrator Hangs with default config in RHEL 9.6

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Yes
    • Important
    • ZStream
    • rhel-ha
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Regression Exception
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      Just a standard configuration / setup for booth multi-site

      What is the impact of this issue to you?

      • Cluster Nodes are not able to connect to booth arbitrator.
      • Booth arbitrator hangs after start.

      Please provide the package NVR for which the bug is seen:

      booth-core-1.1-2.el9.x86_64
      booth-arbitrator-1.1-2.el9.noarch

      kernel-5.14.0-570.12.1.el9_6.x86_64

       

      How reproducible is this bug?:

      100%

      Steps to reproduce

      Issue appears to happen by default on this kernel and booth version. Only arbitrator is needed to create the issue, and other cluster nodes are not:

      1. Install kernel version 5.14.0-570.12.1.el9_6 or higher and load kernel ( :
      [root@kvm-03-guest07 ~]# dnf update kernel-5.14.0-570.12.1.el9_6
      Updating Subscription Management repositories.
      Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)                                                              46 MB/s |  64 MB     00:01    
      Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)                                                                 43 MB/s |  67 MB     00:01    
      Last metadata expiration check: 0:00:01 ago on Fri 18 Jul 2025 04:02:07 PM EDT.
      Dependencies resolved.
      =====================================================================================================================================================
       Package                            Architecture          Version                                 Repository                                    Size
      =====================================================================================================================================================
      Installing:
       kernel                             x86_64                5.14.0-570.12.1.el9_6                   rhel-9-for-x86_64-baseos-rpms                1.8 M
      1. Install pcs booth-core booth-arbitrator:
      [root@kvm-03-guest07 ~]# dnf install -y pcs booth-core booth-arbitrator
      Updating Subscription Management repositories.
      Last metadata expiration check: 0:02:42 ago on Fri 18 Jul 2025 04:02:07 PM EDT.
      Dependencies resolved.
      =====================================================================================================================================================
       Package                                   Architecture      Version                               Repository                                   Size
      =====================================================================================================================================================
      Installing:
       booth-arbitrator                          noarch            1.1-2.el9                             beaker-HighAvailability                      11 k
       booth-core                                x86_64            1.1-2.el9                             beaker-HighAvailability                     157 k
       pcs                                       x86_64            0.11.9-2.el9                          beaker-HighAvailability                     4.6 M
      1. Open the firewall for HA services:
      [root@kvm-03-guest07 ~]# firewall-cmd --permanent --add-service=high-availability
      [root@kvm-03-guest07 ~]# # firewall-cmd --add-service=high-availability
      1. Setup, and start booth on the arbitrator:
      2. [root@kvm-03-guest07 ~]# pcs booth setup sites 10.8.1.81 10.8.1.82 arbitrators 10.8.1.78
        
        [root@kvm-03-guest07 ~]# pcs booth start
        booth@booth started
      1. Following this, the arbitrator will hang on status check commands, and other nodes will be unable to connect:
       [root@kvm-03-guest07 ~]# booth status -c booth -D
      Jul 18 16:09:02 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [37785]: debug: reading config file /etc/booth/booth.conf
      Jul 18 16:09:02 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [37785]: debug: read key of size 64 in authfile /etc/booth/booth.key
      ^C   
      1.  

      Expected results

      Booth arbitrator status should so started, and other nodes should be able to connect

      Actual results

      Status commands hang and other nodes are not able to connect to arbitrator.

      Additional Notes

      • This issue doesn't occur if you disable IPv6 connections:
        • Disable ipv6:
          • [root@kvm-03-guest07 ~]# echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
            [root@kvm-03-guest07 ~]# grep . $( find /proc/ | grep disable_ipv6 )
            /proc/sys/net/ipv6/conf/all/disable_ipv6:1
            /proc/sys/net/ipv6/conf/default/disable_ipv6:1
            /proc/sys/net/ipv6/conf/ens3/disable_ipv6:1
            /proc/sys/net/ipv6/conf/lo/disable_ipv6:1
        • Start booth:
          • [root@kvm-03-guest07 ~]# pcs booth start
            booth@booth started    

                              

        •  Status command now completes without hanging:
          • [root@kvm-03-guest07 ~]# booth status -c booth -D
            Jul 18 16:39:03 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [1825]: debug: reading config file /etc/booth/booth.conf
            Jul 18 16:39:03 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [1825]: debug: read key of size 64 in authfile /etc/booth/booth.key
            Jul 18 16:39:03 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [1825]: debug: found myself at 10.8.1.78 (32 bits matched)
            Jul 18 16:39:03 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [1825]: debug: not running: No PID file.   

             

      • Issue doesn't occur on latest RHEL 9.5 kernel:
        • [root@kvm-03-guest07 ~]# uname -r
          5.14.0-503.40.1.el9_5.x86_64
        • [root@kvm-03-guest07 ~]# booth status -c booth -D
          Jul 18 17:14:52 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [4596]: debug: reading config file /etc/booth/booth.conf
          Jul 18 17:14:52 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [4596]: debug: read key of size 64 in authfile /etc/booth/booth.key
          Jul 18 17:14:52 kvm-03-guest07.lab.eng.rdu2.redhat.com booth: [4596]: debug: found myself at 10.8.1.78 (32 bits matched)
          booth_lockpid=738 booth_lockfile='/var/run/booth//booth.pid' booth_pid=738 booth_state=started booth_type=arbitrator booth_cfg_name='booth' booth_id=1217737494 booth_addr_string='10.8.1.78' booth_port=992

           

      • This is likely a kernel level issue, since previous kernels don't hit the issue, and straces show us specifically hanging in recvmsg system call:
      $ cat itdclmquorump1-status-strace.txt
      -----------------------------------------8<-----------------------------------------
      515239 10:31:49.437789 sendto(3<NETLINK:[11763095]>, [{nlmsg_len=20, nlmsg_type=0x16 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|0x300, nlmsg_seq=1, nlmsg_pid=0}, "\x0a\x00\x00\x00"], 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20 <0.000017>
      
      515239 10:31:49.437867 recvmsg(3<NETLINK:[ROUTE:515239]>, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=72, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=515239}, {ifa_family=AF_INET6, ifa_prefixlen=128, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_HOST, ifa_index=if_nametoindex("lo")}, [[{nla_len=20, nla_type=IFA_ADDRESS}, inet_pton(AF_INET6, "::1")], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=91, tstamp=91}], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT]]], [{nlmsg_len=72, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=515239}, {ifa_family=AF_INET6, ifa_prefixlen=64, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_LINK, ifa_index=if_nametoindex("eth0")}, [[{nla_len=20, nla_type=IFA_ADDRESS}, inet_pton(AF_INET6, "fe80::250:56ff:fe8e:1fe2")], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=545, tstamp=545}], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT]]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=515239}, 0]], iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 164 <0.000013>
      
      515239 10:31:49.437954 recvmsg(3<NETLINK:[ROUTE:515239]>, {msg_namelen=12}, 0) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <208.237169>   <--- This never returns
      
      515239 10:35:17.675255 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---

              mnovacek@redhat.com Michal Nováček
              rhn-support-jobaker Joshua Baker
              Reid Wahl Reid Wahl
              Michal Nováček Michal Nováček
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: