-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.z
-
None
-
None
-
False
-
-
None
-
Critical
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
- The cluster's etcd database size surged from 3.8 GiB to 8 GiB, within an 8 minute time frame.
- This was the outcome of a DR exercise, where 70 new nodes were added to cluster, scaling therefore from 186 to 256 nodes total (not all 70 nodes joined the cluster though, but roughly 20-30 of them).
- As a result, etcd went into read-only and etcd backend quota had to be manually adjusted to recover.
- Cluster's audit logs, must-gather & Prometheus metrics showed no single suspect in generating that kind of growth.
Version-Release number of selected component (if applicable):
RHOCP 4.18.28
Actual results:
etcd database more than doubled in size when adding dozens of nodes to the cluster.
Expected results:
Identity the root cause of the database growth