-
Bug
-
Resolution: Unresolved
-
Critical
-
rhos-18.0 FR 2 (Mar 2025)
-
None
-
3
-
False
-
-
False
-
?
-
mariadb-operator-container-1.0.14-1
-
None
-
-
-
-
Sprint 2, Sprint 3, Sprint 4, Sprint 5
-
4
-
Critical
To Reproduce Steps to reproduce the behavior:
- Having a RHOSO testing cluster functioning correctly
- Doing faulty test unintentionally and scaling a deployment to 300 replicas
- System resources were exhausted
- Galera cluster became non-functional
Expected behavior
- Galera cluster should continue working through its resource reservations.
- Spawning 300 replicas of pods should fail, at least partially, if cluster remaining resources does not allow it. This scale should not impact the
Device Info (please complete the following information):
- RHOSO FR2 deployment in DCN/DZ environment, with 3 AZs, and so 3 Cells (besides the Cell0)
Bug impact
- Galera cluster became non-functional
- The Galera operator is unable to recover or rebuild the cluster after a full outage.
- The customer has a huge concern about how they can use RHOSO in our production environment.
Known workaround
- This was not tested, but may be explored: Set up values on reserved and limits ressources (RAM, CPU ..) of Galera pods through infra-operator. This way we garantie resource reservation to these pods preventing the case of galera outage in case of clusters scale
Additional context
- <your text here>
- …