-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.4.z
-
None
-
No
-
None
-
rhel-ha
-
13
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
x86_64
-
None
What were you trying to do that didn't work? During the remove host operation in which we remove resource definitions and constraints defined on a host, Pacemaker hit an error and unexpected shut itself down. Here are the errors around the shutdown events:
Apr 15 18:58:29.866 svtlnxps05 pacemaker-controld [53865] (do_lrm_rsc_op) error: Could not initiate start action for resource db2_instancehost_jstamko2 locally: No such device | rc=19
Apr 15 18:58:29.870 svtlnxps05 pacemaker-execd [53862] (process_lrmd_get_rsc_info) info: Agent information for 'db2_instancehost_jstamko2' not in cache
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (process_lrm_event) error: Unable to record db2_instancehost_jstamko2_start_0 result in CIB: No resource information
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (log_executor_event) error: Result of start operation for db2_instancehost_jstamko2 on svtlnxps05: Internal communication failure (No such device) | graph action unconfirmed; call=999999999 key=db2_instancehost_jstamko2_start_0
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (register_fsa_error_adv) info: Resetting the current action list
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_log) warning: Input I_FAIL received in state S_NOT_DC from do_lrm_rsc_op
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_state_transition) notice: State transition S_NOT_DC -> S_RECOVERY | input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_recover) warning: Fast-tracking shutdown in response to errors
Apr 15 18:58:29.870 svtlnxps05 pacemaker-controld [53865] (do_log) error: Input I_TERMINATE received in state S_RECOVERY from do_recover
Apr 15 18:58:29.878 svtlnxps05 pacemaker-controld [53865] (crmd_fast_exit) error: Could not recover from internal error
Apr 15 18:58:29.882 svtlnxps05 pacemaker-controld [53865] (crm_exit) info: Exiting pacemaker-controld | with status 1
Apr 15 18:58:29.882 svtlnxps05 pacemakerd [53853] (pcmk_child_exit) error: pacemaker-controld[53865] exited with status 1 (Error occurred)
What is the impact of this issue to you? The drop node operation failed.
Please provide the package NVR for which the bug is seen: 2.1.9-1
How reproducible is this bug?: Not sure, only hit once so far.
Steps to reproduce
- Set up a cluster consists of 4 hosts
- From one host, run operations that would delete resource definition and remove resource constraints on another host
- When the error is hit, Pacemaker would shutdown and restarted itself back. However, the node was fenced during the process and resulted in unexpected behaviour