Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: rhos-18.0.6
Component/s: mariadb-operator
Labels:
None

Activity Type:
Bug Tracking
Story Points:
0
Epic Link:
[BugEpic]: mariadb-operator stops reconciling galera pods, leaving restarting in wait state
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-ops-platform-services-pidone
Regression:
None
Intelligence Requested:
Market:
PX Impact Score:

Sprint:
Sprint 11, Sprint 14
sprint_count:
2
Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description

Witnessed on a customer environment, running post-FR2 version of the mariadb operator.

The pods of the two galera CRs were restarted after an environment issue, which meant the respective galera clusters stopped and needed to be restarted.

Still, we could not detect any sign of reconciliation event taking place in the mariadb operator. It all looked like the stop event did not get sent to the operator, which in turn could not restart the clusters.

At this stage the galera pod would regularly hit a liveness probe error because no galera server could be restarted, leading to a recurring restart of pods.
This is currently not picked up by the mariadb-operator, who only reacts to change in statefulset's availableReplicas.

Bug impact

Major service disruption, as database service goes into outage and is not resolved automatically.

Known workaround

Restarting the mariadb-operator forces a initial reconciliation, so the cluster can be restarted.

Assignee:: Damien Ciabrini

Reporter:: Damien Ciabrini

Team:: rhos-dfg-pidone

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/12/11 9:45 AM

Updated:: 2026/01/26 2:36 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty