Uploaded image for project: 'OpenShift Etcd'
  1. OpenShift Etcd
  2. ETCD-626

Enable upstream flag to improve availability during defrag

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Enable upstream flag to improve availability during defrag
    • BU Product Work
    • False
    • None
    • False
    • Not Selected
    • To Do
    • OCPSTRAT-319 - [internal] Explore options for hitless automatic defrag of etcd
    • OCPSTRAT-319[internal] Explore options for hitless automatic defrag of etcd
    • 80% To Do, 20% In Progress, 0% Done

      Epic Goal*

      Enable the `--experimental-stop-grpc-service-on-defrag` flag on etcd to skip request sent to an etcd member that is undergoing defragmentation.

      After enabling the flag, we will want to have the perfscale team validate the etcd performance for a cluster with a large number of API requests during defragmentation.

       
      Why is this important? (mandatory)

      See https://issues.redhat.com/browse/OCPSTRAT-319 for background, but generally speaking this can help improve API availability on large clusters where a leader undergoing defragmentation won't serve client requests.

      Also see: https://github.com/kubernetes/kubernetes/issues/93280
        

      Scenarios (mandatory) 

      1. The `--experimental-stop-grpc-service-on-defrag` should be enabled on all etcd members running in the cluster.
      2. There should not be a degradation in API availability during defragmentation of etcd members.

      Depending on the perf scale results we may want to consider first enabling this flag by default on on techpreview clusters. 

      Dependencies (internal and external) (mandatory)

      We will need to coordinate a story with the perfscale team to validate etcd performance with this flag on.

      Contributing Teams(and contacts) (mandatory) 

      Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

      • Development - etcd team
      • Documentation - etcd docs
      • QE - etcd QE
      • PX - 
      • Others -

      Acceptance Criteria (optional)

      Provide some (testable) examples of how we will know if we have achieved the epic goal.  

      Drawbacks or Risk (optional)

      Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

      Done - Checklist (mandatory)

      The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

      • CI Testing -  Basic e2e automationTests are merged and completing successfully
      • Documentation - Content development is complete.
      • QE - Test scenarios are written and executed successfully.
      • Technical Enablement - Slides are complete (if requested by PLM)
      • Engineering Stories Merged
      • All associated work items with the Epic are closed
      • Epic status should be “Release Pending” 

              melbeher@redhat.com Mustafa Elbehery
              rhn-coreos-htariq Haseeb Tariq
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: