Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-471

etcd: improve slow follower reliability & performance

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Core
    • False
    • Hide

      None

      Show
      None
    • False
    • 50% To Do, 0% In Progress, 50% Done
    • 0
    • 0

      Feature Overview (aka. Goal Summary)  

      This feature looks to improve the overall experience of an etcd client for the following scenarios:

      1) When the etcd leader is the is the one with the highest disk or network latency. In situations where a node is experiencing storage or network issues affecting the overall performance of etcd, if that node is elected and remain as leader, the etcd cluster will experience a performance degradation even when the other etcd members might be in excellent conditions. Under these circumstances, we should provide a way for the etcd operator to trigger a new election and penalize the node with the suboptimal conditions.

       

      2) When an etcd client connects to an etcd member, the session remains pinned there for the duration of the transactions. The etcd client will not use any other member unless it detects a failure on the etcd member it is pinned to. In scenarios where the etcd client is pinned to an etcd member with suboptimal conditions (e.g. high network latency, high disk latency, etc), the etcd client will experience bad responses from the etcd cluster. As the etcd member proxies the write requests to the etcd leader, the effect of network latency doubles. We should provide a mechanism for etcd members experiencing suboptimal conditions (compared to the other etcd members) to return a message or notification to the client so the etcd client can choose a different members. If suboptimal conditions persist, it the member should not participate listening for clients (port 2379) even when it remains in the cluster listening to peers (port 2380)

       

      _Note: The work should consider https://github.com/etcd-io/etcd/issues/14501

      Requirements (aka. Acceptance Criteria):

      • This feature should have the corresponding CI-test to validate it remains operational

       

       

            wcabanba@redhat.com William Caban
            wcabanba@redhat.com William Caban
            Matthew Werner Matthew Werner
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: