-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.17.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Payloads are consistently failing with `etcd should not log excessive took too long messages` failures. Reviewing job logs for etcd-operator there are regular errors like
E0802 22:35:21.192895 1 base_controller.go:268] EtcdEndpointsController reconciliation failed: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [{Member:ID:5121106053231799069 name:"ci-op-s7f1kx4n-dc81f-4jktl-master-1" peerURLs:"https://10.0.0.6:2380" clientURLs:"https://10.0.0.6:2379" Healthy:true Took:2.608637ms Error:<nil>} {Member:ID:9603999942119413184 name:"ci-op-s7f1kx4n-dc81f-4jktl-master-2" peerURLs:"https://10.0.0.8:2380" clientURLs:"https://10.0.0.8:2379" Healthy:true Took:3.007143ms Error:<nil>} {Member:ID:17824506270444741063 name:"ci-op-s7f1kx4n-dc81f-4jktl-master-0" peerURLs:"https://10.0.0.7:2380" clientURLs:"https://10.0.0.7:2379" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.0.7:2379]: context deadline exceeded}]
Note that https://github.com/openshift/cluster-etcd-operator/pull/1309 landed around the time issues started. Will test a revert to see if results improve without it.