-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
rhel-9.2.0
-
None
-
No
-
None
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
1
-
False
-
-
None
-
None
-
None
-
None
-
x86_64
-
None
What were you trying to do that didn't work? While running network down testing on a host, a resource on a different host was unexpectedly reassigned and started on a new host.
Here is the sequence of event:
- Cluster consists of 4 hosts (ps-1, ps-2, ps-3, ps-4) and a QDevice
- At 2024-08-26-21.21.49, ip link down the ethernet interface on host ps-1. ps-1 node was fenced, member resource failed over to run on ps-2 host. Everything worked as expected
- At 2024-08-26-21.24.20, ip link up the ethernet interface on host ps-1. ps-1 node rejoined the cluster and resources restarted as expected.
- At 2024-08-26-21.25.05, the scheduler unassigned resource 'db2_cfprimary_db2inst1' and then start it on a different host ps-4. This resource was running on node ps-3 at this time. This was not expected
Aug 26 21:25:05.710 ps-3 pacemaker-schedulerd[10161] (pcmk__unassign_resource) info: Unassigning db2_cfprimary_db2inst1
Aug 26 21:25:05.716 ps-3 pacemaker-schedulerd[10161] (log_list_item) notice: Actions: Start db2_cfprimary_db2inst1 ( ps-4 )
- As a result, the db2_cfrimary_db2inst1 was stopped on ps-3 and then restarted on ps-4. This has a side effect of causing an error on another resource db2_cf_db2inst1_128.
- At 2024-06-26-21.25.17 resource db2_cf_db2inst1_128 monitor failed. This was expected because of the primary failover occurred earlier
Aug 26 21:25:17.562 ps-3 pacemaker-controld [10163] (log_executor_event) notice: Result of monitor operation for db2_cf_db2inst1_128 on ps-3: not running | graph action unconfirmed; call=143 key=db2_cf_db2inst1_128_monitor_10000 rc=7
- But recovery action for db2_cf_db2inst1_128 resource was delayed by 108 seconds. This was not expected.
Aug 26 21:27:05.047 ps-3 pacemaker-schedulerd[10161] (log_list_item) notice: Actions: Recover db2_cf_db2inst1_128 ( ps-3 )
Please provide the package NVR for which bug is seen:
Pacemaker 2.1.7-4.db2pcmk.el9.2
How reproducible: Intermittent
Steps to reproduce
- Set up a cluster with 4 nodes and QDevice
- Set up the Db2 pureScale resource model
- Take down the ethernet interface on a member host