Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: rhos-19.0.0
Affects Version/s: rhos-16.2.z
Component/s: mariadb-operator
Labels:
None

Story Points:
13
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-ops-platform-services-pidone
Regression:
None
Intelligence Requested:
Market:
PX Impact Score:

Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Galera node sometimes fail to fully synchronize after joining the cluster.

This is a follow up of https://issues.redhat.com/browse/OSPRH-12518

This is another attempt at coming up with a conclusive RCA for issue
reported in ~~OSPRH-12518~~; under circumstances that involve a heavy
loaded environment, a Galera node joining an existing cluster would
correctly integrate the data it received from the SST it requested
over rsync, but somehow fail to behave correctly after it integrate
the remaining write sets over IST to catch up with the state of the
cluster.

As an additional information, although SST can be implemented with
rsync or mariabackup, there is currently no evidence that the latter
would prevent this problem from occuring. Moreover we have witnessed
a handful of cases where the error message "WSREP: Failed to apply trx"
reported in this Jira was seen in other environments. So the goal
of this Jira is to track the similarities and hopefully come up
with a definitive explanation as to why this problem can occur,
and whether this is still an issue in OSP 17 and beyond.

For reference, after an initial discussion, we could not determined
from the logs of the original case that the rsync SST misbehaved in
any way. While this is no proof, we are thinking that the issue may lie
in the way the IST is integrated post SST.

This Jira is a tracker to log actions that are needed to try to
reproduce a the issue under the right environment conditions.

relates to

OSPRH-12518 BZ#2252279 [OSP 16] galera replication fails after SST with "[ERROR] WSREP: Failed to apply trx"

Closed

Assignee:: Damien Ciabrini

Reporter:: Damien Ciabrini

Team:: rhos-dfg-pidone

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/08/11 4:05 PM

Updated:: 2026/01/12 10:08 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty