Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: resource-agents
Labels:
- RFE

Regression:
No
Severity:
Moderate
Customer Impact:

Customer Escalated

AssignedTeam:
rhel-ha

Story Points:
3
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Experience:

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Description:

When the Pacemaker db2 resource agent (RA) initiates a promote operation on a standby node, it uses the timeout parameter to control how long it should wait for the promotion to succeed.

However, during Db2 HADR failover scenarios, especially when the standby is in:

HADR state: STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED

…the takeover can appear to hang because Db2 is still performing log replay, which can take several minutes depending on the log gap and workload.

Current Behavior:

The promote action is terminated after the configured timeout.

This results in unnecessary failover aborts, fencing, or retries, even though Db2 is still actively progressing through log replay during takeover.

Requested Enhancement:

Amend the promote action logic in the resource agent to:

Check the HADR state (using db2pd -hadr or equivalent internal method).

If the state is:

STANDBY/REMOTE_CATCHUP_PENDING
AND log replay is still progressing (as inferred from replay log LSN or known active state),

THEN suppress or extend the timeout, allowing Db2 takeover to complete gracefully.

Code Context:

Relevant RA: resource-agents/heartbeat/db2

Target function:
Lines ~557–560, inside the promote operation logic.

{{# promote action

db2 takeover ... currently uses timeout}}
Suggested Hook:

Add a conditional wrapper like:

~~~

if hadr_state == "REMOTE_CATCHUP_PENDING" && log_replay_active; then
sleep + monitor replay progress
continue waiting
else
proceed/exit on timeout
fi

~~~

This ensures that ongoing, valid log replay is not treated as a failure.

Business Justification:

One of our customers (Account ID: 402911) is actively deploying pacemaker based db2-hadr has observed this behavior (support case 04177362).

The customer requires this enhancement to avoid false negatives during automatic failover scenarios.

links to

db2: Suppress promote timeout in Pacemaker db2 RA when HADR standby is in REMOTE_CATCHUP_PENDING state and log replay is in progress

Assignee:: Oyvind Albrigtsen

Reporter:: Dhananjay Mule

Developer:: Oyvind Albrigtsen

QA Contact:: Cluster QE

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/07/01 10:32 AM

Updated:: 2025/10/10 7:20 AM

Stale Date:: 2026/06/30

[RFE] db2: Suppress promote timeout in Pacemaker db2 RA when HADR standby is in REMOTE_CATCHUP_PENDING state and log replay is in progress

Description:

Current Behavior:

Requested Enhancement:

Code Context:

Suggested Hook:

Business Justification:

Details

Description

Description:

Current Behavior:

Requested Enhancement:

Code Context:

Suggested Hook:

Business Justification:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates