Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1701

Default to sufficient space for large etcd databases and other database operations


    • Proactive Architecture
    • False
    • Hide


    • False
    • OCPSTRAT-892Optimize & Enhance Etcd's Functionality, Performance, and Manageability w/ Hosted Control Planes
    • 100% To Do, 0% In Progress, 0% Done
    • 0
    • Program Call

      User Story:

      As a (user persona), I want to be able to:

      • As a customer, I don't want my cluster to fall over because etcd ran out of disk space.
      • As a SRE, I don't want to have to rebuild broken etcd databases.
      • As a manager writing checks for management cluster infra and SREs, I want to spend more on infra so I can save on SRE time.

      so that I can achieve

      • Healthy HyperShift growth.

      Acceptance Criteria:

      Description of criteria:

      • Pegging an etcd database and then running a defrag should not exhaust available disk space in etcd's PersistentVolumes.

      (optional) Out of Scope:

      Figuring out if gb3 PersistentVolumes are resizeable.

      Engineering Details:

      • HOSTEDCP-947 took a run at this, but an 8 GiB etcd bbolt file only just fits into an 8 GiB PersistentVolume, and that leaves no room for WAL files and the database-copy that defrag generates, and whatever else needs to go into /var/lib.  As long as this field is immutable, it's probably worth aiming very high (50 GiB?  Probably get at least a sanity check from the etcd folks), because storage is cheap, and rebuilding etcd databases is expensive.

      This does not require a design proposal.
      This does not require a feature gate.

              Unassigned Unassigned
              trking W. Trevor King
              Matthew Werner Matthew Werner
              Adel Zaalouk Adel Zaalouk
              0 Vote for this issue
              7 Start watching this issue
