Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: rhel-9.8
Affects Version/s: rhel-9.4, rhel-9.4.z
Component/s: pacemaker
Labels:
- Pacemaker
- fixed_upstream

Fixed in Build:
pacemaker-2.1.10-2.el9
Regression:
None
Severity:
Low
AssignedTeam:
rhel-ha

Story Points:
2
ACKs Check:

Dev ack
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
Pass
Testable Builds:
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=3888683
Errata Link:
https://errata.engineering.redhat.com/advisory/156445
Test Coverage:

RegressionOnly

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Architecture:

All

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

When one of the pacemaker sub daemons hangs ( in this case, pacemaker-attrd ), the Pacemaker tries five times to connect to the process, kills it, and respawns it. The problem we encountered is that there is a small timing hole between killing a process and respawning. If the pacemaker-controld tries to connect to a sub daemon that was killed and is in the process of respawning, the controld fails to connect to the daemon and takes that as a fatal error and shuts down the entire pacemaker stack.

What is the impact of this issue to you?

Pacemaker encountered fatal error and shuts itself down and does not recover without manual intervention
Please provide the package NVR for which the bug is seen:

version 2.1.8-3.el9-3980678f0

How reproducible is this bug?:

difficult to reproduce, as it requires Pacemaker controld to interact with the killed sub-daemon before it respawns

Steps to reproduce

Run kill -SIGSTOP one of the Pacemaker sub daemon in this example pacemaker-attrd
Pacemaker logs attrd is unresponsive to ipc and respawns attrd
After the attrd is killed and before it respawns controld connects to the attrd ( to update failure count etc, )
pacemaker-controld fails to connect to attrd daemon and shuts down the entire Pacemaker stack

Expected results

The pacemaker will not try to interact with sub daemon that it just killed and in the process of respawning

Actual results

Pacemaker interacts with sub daemon it just killed and thus entire Pacemaker stack goes down

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

sosreport-9-4-testing-hadr-srv-1-87484-2025-04-16-mphiedl.tar.xz
14.98 MB
2025/04/16 1:43 PM
sosreport-9-4-testing-hadr-srv-1-2025-04-16-qjcpksz.tar.xz
15.19 MB
2025/04/16 2:25 PM
PCMK.log
883 kB
2025/04/16 1:43 PM

links to

pacemaker-attrd is unresponsive to IPC after 5 attempts resulting on the pacemaker-controld fatal error

RHBA-2025:156445 pacemaker update

Assignee:: Christopher Lumens

Reporter:: Dongho Han

Contributors:: Chris Feist, Christopher Lumens, Dongho Han

Developer:: Christopher Lumens

QA Contact:: Jana Rehova

Votes:: 2 Vote for this issue

Watchers:: 15 Start watching this issue

Created:: 2025/04/15 6:13 PM

Updated:: 2026/02/13 4:36 PM

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates