Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-9.4.z
Component/s: pacemaker
Labels:
None

Regression:
No
Severity:
None

AssignedTeam:
rhel-ha

Story Points:
13
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Architecture:

x86_64

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work? During the remove host operation in which we remove resource definitions and constraints defined on a host, Pacemaker hit an error and unexpected shut itself down. Here are the errors around the shutdown events:

Apr 15 18:58:29.866 svtlnxps05 pacemaker-controld [53865] (do_lrm_rsc_op) error: Could not initiate start action for resource db2_instancehost_jstamko2 locally: No such device | rc=19

Apr 15 18:58:29.870 svtlnxps05 pacemaker-execd [53862] (process_lrmd_get_rsc_info) info: Agent information for 'db2_instancehost_jstamko2' not in cache

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (process_lrm_event) error: Unable to record db2_instancehost_jstamko2_start_0 result in CIB: No resource information

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (log_executor_event) error: Result of start operation for db2_instancehost_jstamko2 on svtlnxps05: Internal communication failure (No such device) | graph action unconfirmed; call=999999999 key=db2_instancehost_jstamko2_start_0

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (register_fsa_error_adv) info: Resetting the current action list

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_log) warning: Input I_FAIL received in state S_NOT_DC from do_lrm_rsc_op

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_state_transition) notice: State transition S_NOT_DC -> S_RECOVERY | input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_recover) warning: Fast-tracking shutdown in response to errors

Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_log) error: Input I_TERMINATE received in state S_RECOVERY from do_recover

Apr 15 18:58:29.878 svtlnxps05 pacemaker-controld [53865] (crmd_fast_exit) error: Could not recover from internal error

Apr 15 18:58:29.882 svtlnxps05 pacemaker-controld [53865] (crm_exit) info: Exiting pacemaker-controld | with status 1

Apr 15 18:58:29.882 svtlnxps05 pacemakerd [53853] (pcmk_child_exit) error: pacemaker-controld[53865] exited with status 1 (Error occurred)

What is the impact of this issue to you? The drop node operation failed.

Please provide the package NVR for which the bug is seen: 2.1.9-1

How reproducible is this bug?: Not sure, only hit once so far.

Steps to reproduce

Set up a cluster consists of 4 hosts
From one host, run operations that would delete resource definition and remove resource constraints on another host
When the error is hit, Pacemaker would shutdown and restarted itself back. However, the node was fenced during the process and resulted in unexpected behaviour

Expected results: No error expected when cleaning up resources on a host

Actual results: Pacemaker unexpectedly shutdown and restart

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

unexpected-pacemaker-error-shutdown.tar.bz2
10.19 MB
2025/04/16 8:22 PM

is duplicated by

RHEL-109543 Internal communication error during resource create causes controld to shutdown and fails resource start

Closed

Assignee:: Chris Feist

Reporter:: Kwonmin Bok (Inactive)

Developer:: Chris Feist

QA Contact:: HA Sustaining

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/04/16 8:09 PM

Updated:: 2025/11/10 6:58 PM

Stale Date:: 2026/04/15

Details

Description

What were you trying to do that didn't work? During the remove host operation in which we remove resource definitions and constraints defined on a host, Pacemaker hit an error and unexpected shut itself down. Here are the errors around the shutdown events:

What is the impact of this issue to you? The drop node operation failed.

Please provide the package NVR for which the bug is seen: 2.1.9-1

How reproducible is this bug?: Not sure, only hit once so far.

Steps to reproduce

Expected results: No error expected when cleaning up resources on a host

Actual results: Pacemaker unexpectedly shutdown and restart

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates