Loading...

XML

Word

Printable

Description of problem:

The cluster's etcd database size surged from 3.8 GiB to 8 GiB, within an 8 minute time frame.
- This was the outcome of a DR exercise, where 70 new nodes were added to cluster, scaling therefore from 186 to 256 nodes total (not all 70 nodes joined the cluster though, but roughly 20-30 of them).
- As a result, etcd went into read-only and etcd backend quota had to be manually adjusted to recover.
Cluster's audit logs, must-gather & Prometheus metrics showed no single suspect in generating that kind of growth.

Version-Release number of selected component (if applicable):

RHOCP 4.18.28

Actual results:

etcd database more than doubled in size when adding dozens of nodes to the cluster.

Expected results:

Identity the root cause of the database growth