-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
None
-
None
-
0
-
False
-
-
False
-
?
-
rhos-ops-platform-services-pidone
-
None
-
-
-
-
Critical
The customer reported intermittent issues with the Horizon dashboard displaying "Something went wrong!" error messages between 05:25 PM and 06:40 PM IST on specific dates. During this period, users were unable to view resources, and VM creation jobs failed due to the automatic restart of several backend OpenStack control-plane pods, including Galera (MariaDB), Cinder Scheduler, Cinder Volume, and Cinder Backup.
1- Cinder and galera pods keep restarting affecting the customer creation on new instances
2- Readiness and Liveness connection timeout causing constant restarting
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 42m (x18 over 23d) kubelet Readiness probe failed: command timed out
Warning Unhealthy 42m (x18 over 23d) kubelet Liveness probe failed: command timed out
Normal Started 41m (x16 over 21h) kubelet Started container galera
Normal Pulled 41m (x17 over 21h) kubelet Container image "registry.redhat.io/rhoso/openstack-mariadb-rhel9@sha256:2dd44ddf73d775c9b60421f14e4808bdda377cc57b864bf2d9a1bebd63fd6b41" already present on machine
Normal Created 41m (x17 over 21h) kubelet Created container: galera
Normal Killing 41m (x5 over 20h) kubelet Container galera failed startup probe, will be restarted
Warning FailedPreStopHook 41m (x2 over 19h) kubelet PreStopHook failed
Warning Unhealthy 41m kubelet Startup probe failed: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)
Warning Unhealthy 34m kubelet Readiness probe failed: wsrep_local_state_comment (Inconsistent) differs from Synced
Warning Unhealthy 31m (x7 over 19h) kubelet Readiness probe failed: wsrep_local_state_comment (Initialized) differs from Synced
Warning BackOff 28m (x44 over 20h) kubelet Back-off restarting failed container galera in pod openstack-galera-0_openstack(65196497-8cc3-4245-8147-f58b6392eda1)
3- The pods went in CrashLoopBackOff state
4-From the latest describe logs the state became running again but keeps restarting and has high restart count
/usr/local/bin/kolla_start
State: Running
Started: Wed, 03 Dec 2025 17:20:59 +0530
Last State: Terminated
Reason: Error
Exit Code: 134
Started: Wed, 03 Dec 2025 17:10:48 +0530
Finished: Wed, 03 Dec 2025 17:18:36 +0530
Ready: True
Restart Count: 36
5 - The Galera pod restarted and doing SST (State Snapshot Transfer), due to InnoDB corruption [1], SST failed, thus cannot join back to the Galera cluster, Liveness probe failure triggered;
Here comes the loop what we observed.
[1] 2025-12-03 11:48:34 80 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './neutron/networksegments.ibd' page [page id: space=447, page number=0]. You may have to recover from a backup.
2025-12-03 11:48:34 80 [Note] InnoDB: Page dump (16384 bytes):
2025-12-03 11:48:34 80 [Note] InnoDB: 1703030013d86c6ebedfef30aa4ef2ea2caa86a81bdabcbd0008000000000000
...
2025-12-03 11:48:34 80 [Note] InnoDB: 0000000000000000000000000000000000000000000000000057803933ccf488
2025-12-03 11:48:34 80 [Note] InnoDB: End of page dump
2025-12-03 11:48:34 80 [Note] InnoDB: You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2025-12-03 11:48:34 80 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=447, page number=0] into the buffer pool after 100. The most probable cause of this error may be that the table has been corrupted. See https://mariadb.com/kb/en/library/innodb-recovery-modes/
251203 11:48:34 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
unfortunately customer don't have any mariadb backup.
Need help in resolving the issue and identify the root cause why it got corrupted.