Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: rhel-9.2.0
Component/s: pacemaker
Labels:
None

Regression:
No
Severity:
None

Pool Team:

rhel-sst-high-availability
Sub-System Group:

ssg_filesystems_storage_and_HA

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work? While running network down testing on a host, a resource on a different host was unexpectedly reassigned and started on a new host.

Here is the sequence of event:

Cluster consists of 4 hosts (ps-1, ps-2, ps-3, ps-4) and a QDevice
At 2024-08-26-21.21.49, ip link down the ethernet interface on host ps-1. ps-1 node was fenced, member resource failed over to run on ps-2 host. Everything worked as expected
At 2024-08-26-21.24.20, ip link up the ethernet interface on host ps-1. ps-1 node rejoined the cluster and resources restarted as expected.
At 2024-08-26-21.25.05, the scheduler unassigned resource 'db2_cfprimary_db2inst1' and then start it on a different host ps-4. This resource was running on node ps-3 at this time. This was not expected

Aug 26 21:25:05.710 ps-3 pacemaker-schedulerd[10161] (pcmk__unassign_resource) info: Unassigning db2_cfprimary_db2inst1

Aug 26 21:25:05.716 ps-3 pacemaker-schedulerd[10161] (log_list_item) notice: Actions: Start db2_cfprimary_db2inst1 ( ps-4 )

As a result, the db2_cfrimary_db2inst1 was stopped on ps-3 and then restarted on ps-4. This has a side effect of causing an error on another resource db2_cf_db2inst1_128.

At 2024-06-26-21.25.17 resource db2_cf_db2inst1_128 monitor failed. This was expected because of the primary failover occurred earlier

Aug 26 21:25:17.562 ps-3 pacemaker-controld [10163] (log_executor_event) notice: Result of monitor operation for db2_cf_db2inst1_128 on ps-3: not running | graph action unconfirmed; call=143 key=db2_cf_db2inst1_128_monitor_10000 rc=7

But recovery action for db2_cf_db2inst1_128 resource was delayed by 108 seconds. This was not expected.

Aug 26 21:27:05.047 ps-3 pacemaker-schedulerd[10161] (log_list_item) notice: Actions: Recover db2_cf_db2inst1_128 ( ps-3 )

Please provide the package NVR for which bug is seen:

Pacemaker 2.1.7-4.db2pcmk.el9.2

How reproducible: Intermittent

Steps to reproduce

Set up a cluster with 4 nodes and QDevice
Set up the Db2 pureScale resource model
Take down the ethernet interface on a member host

Expected results: Expect that all resources recovered and restarted successfully in a timely fashion.

Actual results: In this case, one resource db2_cfprimary_db2inst1 moved to a different node unexpectedly causing another resource to fail. A second issue was that it took 108 seconds for recovery action to be triggered after monitor failure for db2_cf_db2inst1_128 resource.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

unexpected-resource-move.tar.bz2
4.24 MB
2024/08/27 8:29 PM

Assignee:: Kenneth Gaillot

Reporter:: Lan Pham

Contributing Groups:: IBM Confidential Group

Developer:: Kenneth Gaillot

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/08/27 8:22 PM

Updated:: 2024/09/26 12:52 PM

Resolved:: 2024/09/09 10:12 PM

Details

Description

What were you trying to do that didn't work? While running network down testing on a host, a resource on a different host was unexpectedly reassigned and started on a new host.

How reproducible: Intermittent

Steps to reproduce

Expected results: Expect that all resources recovered and restarted successfully in a timely fashion.

Actual results: In this case, one resource db2_cfprimary_db2inst1 moved to a different node unexpectedly causing another resource to fail. A second issue was that it took 108 seconds for recovery action to be triggered after monitor failure for db2_cf_db2inst1_128 resource.

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates