-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
100% To Do, 0% In Progress, 0% Done
-
0
-
Program Call
Feature Overview
Create a main documentation section for the control plane that consolidates the information that’s now spread out across multiple sections and articles, so that users can find find all the required information from one landing page, similarly to the Nodes section.
Problem to solve
The Control Plane team regularly gets the same queries about etcd latency, stretched clusters recommendations, performance, which aren't clear in the existing documentation.
With the increased popularity and demand for multi-site topologies to support OpenShift Virtualization, and the 4/5-node control plane support, more questions related to this are arising from the field, and we must provide a clear answer to support the field in the architectural and specification decisions.
Existing documentation
4/5-node control plane (current section: Scalability and Performance)
Recommended etcd practices (current section: Scalability and Performance)
Optimizing storage (current section: Scalability and Performance)
etcd tasks (current section: Postinstallation Configuration)
Backing up etcd (current section: Backup and Restore)
Articles
These documents contain some of the most common questions:
- What's the latency tolerated by etcd nodes?
- Can I use stretched clusters?
- How do I use multiple sites?
- What's the impact on the API server?
We must include this information clearly in the downstream documentation:
Understanding etcd and the tunables/conditions affecting performance
Does OpenShift 4.x have the same stretch cluster latency requirement as OpenShift 3.11?
We cover crucial information like the following in these articles, that's not available in the downstream documentation:
The combined disk and network latency and jitter must maintain an etcd peer round trip time of less than 100ms. This is NOT the same as the network round trip time. See the ETCD timers in OpenShift section below. Layered products (e.g., storage providers) may have lower latency requirements. In those cases, the latency limits are dictated by the requirements of the architecture supported by the layered product. For example, OpenShift cluster deployments that ‘span’ multiple data centers with Red Hat OpenShift Data Foundation must have a latency requirement of less than 10ms RTT. For those cases, follow the specific product guidance.The combined disk and network latency and jitter must maintain an etcd peer round trip time of less than 100ms. This is NOT the same as the network round trip time. See the ETCD timers in OpenShift section below.
Layered products (e.g., storage providers) may have lower latency requirements. In those cases, the latency limits are dictated by the requirements of the architecture supported by the layered product. For example, OpenShift cluster deployments that ‘span’ multiple data centers with Red Hat OpenShift Data Foundation must have a latency requirement of less than 10ms RTT. For those cases, follow the specific product guidance.
A low latency network (with less than 2ms of latency, with v3 and less than 10 ms of RTT latency, with v4) between instances (systems) across all sites. This requirement is driven by etcd, and is needed to ensure stability and quorum (no loss of leaders).
A high bandwidth network (with at least 5-10 Gbps capabilities) is needed
The value of the heartbeat interval should be around the maximum of the average round-trip time (RTT) between members, normally around 1.5x the round-trip time. With the OpenShift Container Platform default heartbeat interval of 100ms, the recommended RTT between control plane nodes is to be less than ~33ms with a maximum of less than 66ms (66ms x 1.5 = 99ms).
a network with a maximum latency of 80ms and jitter of 30ms will experience latencies of 110ms, which means etcd will be missing heartbeats, causing request timeouts and temporary leader loss.
99th percentile of the fsync is greater than the recommended value which is 20 ms, faster disks are recommended to host etcd for better performance
Use Prometheus to track the metric. histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m])) reports the round trip time for etcd to finish replicating the client requests between the members; it should be less than 50 ms.
These data points need to be clear and concise in the downstream documentation and easily found in a section dedicated to the control plane as they are critical for running a healthy and stable cluster.
In the following blog post, with profiles we have an even higher latency tolerance:
https://www.redhat.com/en/blog/introducing-selectable-profiles-for-etcd
We need to reconcile all this information and consolidate it as the field keeps catching up with this information, which can be confusing.