Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76107

Read-only etcd after a massive database growth in a short time frame

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.z
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      • The cluster's etcd database size surged from 3.8 GiB to 8 GiB, within an 8 minute time frame.
        • This was the outcome of a DR exercise, where 70 new nodes were added to cluster, scaling therefore from 186 to 256 nodes total (not all 70 nodes joined the cluster though, but roughly 20-30 of them).
        • As a result, etcd went into read-only and etcd backend quota had to be manually adjusted to recover.
      • Cluster's audit logs, must-gather & Prometheus metrics showed  no single suspect in generating that kind of growth.

      Version-Release number of selected component (if applicable):

      RHOCP 4.18.28

      Actual results:

      etcd database more than doubled in size when adding dozens of nodes to the cluster.

      Expected results:

      Identity the root cause of the database growth

              rhn-support-liqcui Liquan Cui
              rhn-support-rsandu Robert Sandu
              Huiran Wang Huiran Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: