Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-319

[internal] Explore options for hitless automatic defrag of etcd

XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 33% To Do, 33% In Progress, 33% Done
    • 0
    • Program Call

      Feature Overview (aka. Goal Summary)  

      There are various requests similar to RFE-2799 OCPBU-247 which the underlying cause is that certain level of disrruption is experienced in clusters with high transaction rate towards the API Server.

      Goals (aka. expected user outcomes)

      After careful discussion and analysis we should try to address these in amore cloud native form while achieving the goal of an experience of hitless defrag of the etcd database.

      Requirements (aka. Acceptance Criteria):

      The current defrag process is launched when at least a 45% defrag is detected. On clusters with small etcd databases this process can happen within few seconds but in clusters with large etcd databases it can last enough for the users to experiment failures in the transactions.

      This last scenario is where customers need an experience that do not affect their main business process.

      Questions to Answer (Optional):

      • Can we improve the algorithm for triggering the fragmentation to be smarter when it runs? (e.g. consider etcd transaction rate, fragmentation vs database size, "lazy fragmentation" < 2GB and "force fragmentation" > 2G
      • Can the operator be designed in such a way that the rolling fragmentation of the etcd members can be done in a way where it does not block transactions during that process?

      For example, if the etcd defrag process forces the member to stop committing while it "replay" the copy of the data from memory to disk to create a new defragmented database. Can we improve the design of etcd such that during the defrag process the etcd member can continue operating on an alternate buffer or copy of the db

      Out of Scope

      High-level list of items that are out of scope.  Initial completion during Refinement status.

       

              racedoro@redhat.com Ramon Acedo
              wcabanba@redhat.com William Caban
              Matthew Werner Matthew Werner
              Votes:
              4 Vote for this issue
              Watchers:
              20 Start watching this issue

                Created:
                Updated: