"etcdserver: leader changed" causes clients to fail.
This error should never bubble up to clients because the kube-apiserver can always retry this failure mode since it knows the data was not modified. When etcd adjusts timeouts for leader election and heartbeating for slow hardware like Azure, the hardcoded timeouts in the kube-apiserver/etcd fail. See
- kube-apiserver tries to use etcd retries: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go#L308-L317
- etcd retries appear to be unconditionally added: https://github.com/etcd-io/etcd/blob/main/client/v3/client.go#L243-L249 and https://github.com/etcd-io/etcd/blob/release-3.5/client/v3/client.go#L286
- etcd retries retry a max of 2.5 seconds: https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L53 + https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L45
- etcd retries are further reduced by zero-second retry on quorum
- On azure https://github.com/openshift/cluster-etcd-operator/blob/d7d43ee21aff6b178b2104228bba374977777a84/pkg/etcdenvvar/etcd_env.go#L229 slower leader change reactions https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/hwspeedhelpers/hwhelper.go#L28 mean we are likely to exceed the number of retries for requests near the beginning of a change
Simply saying, "oh, it's hardcoded and kube" isn't good enough. We have previously had a storage shim to retry such problems. If all else fails, bringing back the small shim to retry Unavailable etcd errors longer is appropriate to fix all available clients.
Additionally, this etcd capability is being made more widely available and this bug prevents that from working.