Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-130732

Fatal error message in SBD service (devel work)

Linking RHIVOS CVEs to...Migration: Automation ...RHELPRIO AssignedTeam ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • sbd
    • None
    • 1
    • rhel-ha
    • HA-PCMK Sprint #5
    • 5
    • False
    • Hide

      None

      Show
      None
    • None

      What were you trying to do that didn't work?

      On Azure, we are validating RHEL 10 OS for SAP workload. We have setup a 2-node cluster with SBD as stonith mechansim. The systemd service of SBD is throwing below error message (highlighted in red) on both the nodes. 

      We don't see any issue in SBD behavior during our testing. But we want to understand what this fatal internal error is about. 

      root@rh0dhdb00l025:~# systemctl status sbd
      ● sbd.service - Shared-storage based fencing daemon
           Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; preset: disabled)
          Drop-In: /etc/systemd/system/sbd.service.d
                   └─sbd_delay_start.conf
           Active: active (running) since Wed 2025-11-12 23:38:15 UTC; 21h ago
       Invocation: 0bea9a513c454f56ab7309e3f64f6f5f
             Docs: man:sbd(8)
         Main PID: 7242 (sbd)
            Tasks: 6 (limit: 1025784)
           Memory: 19.6M (peak: 20.6M)
              CPU: 1min 30.612s
           CGroup: /system.slice/sbd.service
                   ├─7242 "sbd: inquisitor"
                   ├─7243 "sbd: watcher: /dev/disk/by-id/scsi-3600140568f22b8820e6462d8ed2d256e - slot: 1 - uuid: 3fc1f9d7-3af2-4592-8e45-c98897e67d51"
                   ├─7244 "sbd: watcher: /dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230 - slot: 1 - uuid: 606c9cf0-900b-4eb8-95d9-9a2e933a7250"
                   ├─7245 "sbd: watcher: /dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6 - slot: 1 - uuid: 6003d801-8249-4361-acd1-ccd04cd51624"
                   ├─7246 "sbd: watcher: Pacemaker"
                   └─7247 "sbd: watcher: Cluster"

      Nov 12 23:38:14 rh0dhdb00l025 sbd[7244]: /dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230:   notice: servant_md: Monitoring slot 1 on disk /dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230
      Nov 12 23:38:14 rh0dhdb00l025 sbd[7245]: /dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6:   notice: servant_md: Monitoring slot 1 on disk /dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6
      Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]:    cluster:   notice: servant_cluster: Monitoring corosync cluster health
      Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]:    cluster:   notice: verify_against_cmap_config: Corosync is in 2Node-mode
      Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]:    cluster:    error: log_assertion_as: pcmk_server_message_type: Triggered fatal assertion at servers.c:164 : (server > 0) && (server < PCMK_NELEM(server_info))
      Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]:    cluster:   notice: update_peer_state_iter: Node rh0dhdb00l025 state is now member | nodeid=1 previous=unknown source=crm_update_peer_proc
      Nov 12 23:38:14 rh0dhdb00l025 sbd[7242]:   notice: inquisitor_child: Servant cluster is healthy (age: 0)
      Nov 12 23:38:15 rh0dhdb00l025 sbd[7242]:   notice: watchdog_init: Using watchdog device '/dev/watchdog'
      Nov 12 23:38:15 rh0dhdb00l025 systemd[1]: Started sbd.service - Shared-storage based fencing daemon.
      Nov 12 23:38:19 rh0dhdb00l025 sbd[7242]:   notice: inquisitor_child: Servant pcmk is healthy (age: 0)

      What is the impact of this issue to you?

      Currently don't see any impact but don't know if there could be any issue due to this fatal error message in any edge cases. 

      Please provide the package NVR for which the bug is seen:

       

      root@rh0dhdb00l025:~# rpm -qa | grep -Ei "pacemaker|corosync|sbd|fence-agents-sbd"
      corosynclib-3.1.9-1.el10_0.1.x86_64
      pacemaker-schemas-3.0.0-5.1.el10_0.noarch
      pacemaker-libs-3.0.0-5.1.el10_0.x86_64
      pacemaker-cluster-libs-3.0.0-5.1.el10_0.x86_64
      corosync-3.1.9-1.el10_0.1.x86_64
      pacemaker-3.0.0-5.1.el10_0.x86_64
      pacemaker-cli-3.0.0-5.1.el10_0.x86_64
      sbd-1.5.2-1.el10.5.x86_64
      fence-agents-sbd-4.16.0-5.el10_0.6.noarch

      root@rh0dhdb00l025:~# more /etc/os-release 
      NAME="Red Hat Enterprise Linux"
      VERSION="10.0 (Coughlan)"
      ID="rhel"
      ID_LIKE="centos fedora"
      VERSION_ID="10.0"
      PLATFORM_ID="platform:el10"
      PRETTY_NAME="Red Hat Enterprise Linux 10.0 (Coughlan)"
      ANSI_COLOR="0;31"
      LOGO="fedora-logo-icon"
      CPE_NAME="cpe:/o:redhat:enterprise_linux:10.0"
      HOME_URL="https://www.redhat.com/"
      VENDOR_NAME="Red Hat"
      VENDOR_URL="https://www.redhat.com/"
      DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/10"
      BUG_REPORT_URL="https://issues.redhat.com/"

      REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 10"
      REDHAT_BUGZILLA_PRODUCT_VERSION=10.0
      REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
      REDHAT_SUPPORT_PRODUCT_VERSION="10.0"

      How reproducible is this bug?:

      Everytime

       

      Steps to reproduce

      1. Attached a shared LUN(s) in two node cluster
      2. Configure the SBD

      root@rh0dhdb00l025:~# more /etc/sysconfig/sbd | grep -v '#'

      SBD_PACEMAKER=yes

      SBD_STARTMODE=always

      SBD_DELAY_START=186

      SBD_WATCHDOG_DEV=/dev/watchdog

      SBD_WATCHDOG_TIMEOUT=5

      SBD_TIMEOUT_ACTION=flush,reboot

      SBD_MOVE_TO_ROOT_CGROUP=auto

      SBD_SYNC_RESOURCE_STARTUP=yes

      SBD_OPTS=
      SBD_DEVICE="/dev/disk/by-id/scsi-3600140568f22b8820e6462d8ed2d256e;/dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230;/dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6"

      1. Setup the cluster
      2. Enable the SBD service "systemctl enable sbd"
      3. Start the cluster. This will start the SBD service 
      4. Check SBD service: "systemctl status sbd". The fatal error message would pop up. 

      Expected results

      Actual results

              rhn-engineering-kwenning Klaus Wenninger
              rhn-engineering-kwenning Klaus Wenninger
              Microsoft Confidential Group
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: