Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: sbd
Labels:
None

sprint_count:
1

AssignedTeam:
rhel-ha

Sprint:
HA-PCMK Sprint #5
Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None

Planning:
None

What were you trying to do that didn't work?

On Azure, we are validating RHEL 10 OS for SAP workload. We have setup a 2-node cluster with SBD as stonith mechansim. The systemd service of SBD is throwing below error message (highlighted in red) on both the nodes.

We don't see any issue in SBD behavior during our testing. But we want to understand what this fatal internal error is about.

root@rh0dhdb00l025:~# systemctl status sbd
● sbd.service - Shared-storage based fencing daemon
Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/sbd.service.d
└─sbd_delay_start.conf
Active: active (running) since Wed 2025-11-12 23:38:15 UTC; 21h ago
Invocation: 0bea9a513c454f56ab7309e3f64f6f5f
Docs: man:sbd(8)
Main PID: 7242 (sbd)
Tasks: 6 (limit: 1025784)
Memory: 19.6M (peak: 20.6M)
CPU: 1min 30.612s
CGroup: /system.slice/sbd.service
├─7242 "sbd: inquisitor"
├─7243 "sbd: watcher: /dev/disk/by-id/scsi-3600140568f22b8820e6462d8ed2d256e - slot: 1 - uuid: 3fc1f9d7-3af2-4592-8e45-c98897e67d51"
├─7244 "sbd: watcher: /dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230 - slot: 1 - uuid: 606c9cf0-900b-4eb8-95d9-9a2e933a7250"
├─7245 "sbd: watcher: /dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6 - slot: 1 - uuid: 6003d801-8249-4361-acd1-ccd04cd51624"
├─7246 "sbd: watcher: Pacemaker"
└─7247 "sbd: watcher: Cluster"

Nov 12 23:38:14 rh0dhdb00l025 sbd[7244]: /dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230: notice: servant_md: Monitoring slot 1 on disk /dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230
Nov 12 23:38:14 rh0dhdb00l025 sbd[7245]: /dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6: notice: servant_md: Monitoring slot 1 on disk /dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6
Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]: cluster: notice: servant_cluster: Monitoring corosync cluster health
Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]: cluster: notice: verify_against_cmap_config: Corosync is in 2Node-mode
Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]: cluster: error: log_assertion_as: pcmk_server_message_type: Triggered fatal assertion at servers.c:164 : (server > 0) && (server < PCMK_NELEM(server_info))
Nov 12 23:38:14 rh0dhdb00l025 sbd[7247]: cluster: notice: update_peer_state_iter: Node rh0dhdb00l025 state is now member | nodeid=1 previous=unknown source=crm_update_peer_proc
Nov 12 23:38:14 rh0dhdb00l025 sbd[7242]: notice: inquisitor_child: Servant cluster is healthy (age: 0)
Nov 12 23:38:15 rh0dhdb00l025 sbd[7242]: notice: watchdog_init: Using watchdog device '/dev/watchdog'
Nov 12 23:38:15 rh0dhdb00l025 systemd[1]: Started sbd.service - Shared-storage based fencing daemon.
Nov 12 23:38:19 rh0dhdb00l025 sbd[7242]: notice: inquisitor_child: Servant pcmk is healthy (age: 0)

What is the impact of this issue to you?

Currently don't see any impact but don't know if there could be any issue due to this fatal error message in any edge cases.

Please provide the package NVR for which the bug is seen:

root@rh0dhdb00l025:~# rpm -qa | grep -Ei "pacemaker|corosync|sbd|fence-agents-sbd"
corosynclib-3.1.9-1.el10_0.1.x86_64
pacemaker-schemas-3.0.0-5.1.el10_0.noarch
pacemaker-libs-3.0.0-5.1.el10_0.x86_64
pacemaker-cluster-libs-3.0.0-5.1.el10_0.x86_64
corosync-3.1.9-1.el10_0.1.x86_64
pacemaker-3.0.0-5.1.el10_0.x86_64
pacemaker-cli-3.0.0-5.1.el10_0.x86_64
sbd-1.5.2-1.el10.5.x86_64
fence-agents-sbd-4.16.0-5.el10_0.6.noarch

root@rh0dhdb00l025:~# more /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="10.0 (Coughlan)"
ID="rhel"
ID_LIKE="centos fedora"
VERSION_ID="10.0"
PLATFORM_ID="platform:el10"
PRETTY_NAME="Red Hat Enterprise Linux 10.0 (Coughlan)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:10.0"
HOME_URL="https://www.redhat.com/"
VENDOR_NAME="Red Hat"
VENDOR_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/10"
BUG_REPORT_URL="https://issues.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 10"
REDHAT_BUGZILLA_PRODUCT_VERSION=10.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="10.0"

How reproducible is this bug?:

Everytime

Steps to reproduce

Attached a shared LUN(s) in two node cluster
Configure the SBD

root@rh0dhdb00l025:~# more /etc/sysconfig/sbd | grep -v '#'

SBD_PACEMAKER=yes

SBD_STARTMODE=always

SBD_DELAY_START=186

SBD_WATCHDOG_DEV=/dev/watchdog

SBD_WATCHDOG_TIMEOUT=5

SBD_TIMEOUT_ACTION=flush,reboot

SBD_MOVE_TO_ROOT_CGROUP=auto

SBD_SYNC_RESOURCE_STARTUP=yes

SBD_OPTS=
SBD_DEVICE="/dev/disk/by-id/scsi-3600140568f22b8820e6462d8ed2d256e;/dev/disk/by-id/scsi-36001405aed93b0201c940629159f2230;/dev/disk/by-id/scsi-3600140544c9ccfd0f134917b0d547ed6"

Setup the cluster
Enable the SBD service "systemctl enable sbd"
Start the cluster. This will start the SBD service
Check SBD service: "systemctl status sbd". The fatal error message would pop up.

Expected results

Actual results

split to

RHEL-128442 Fatal error message in SBD service

Closed

Assignee:: Klaus Wenninger

Reporter:: Klaus Wenninger

Contributing Groups:: Microsoft Confidential Group

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/11/24 4:05 PM

Updated:: 2025/11/25 4:19 PM

Resolved:: 2025/11/25 4:19 PM

Stale Date:: 2026/11/20

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates