-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.15
-
Critical
-
Yes
-
5
-
ETCD Sprint 245, ETCD Sprint 246, ETCD Sprint 247
-
3
-
Approved
-
False
-
-
N/A
-
Release Note Not Required
Description of problem:
Based on this and this component readiness data that compares success rates for those two particular tests, we are regressing ~7-10% between the current 4.15 master and 4.14.z (iow. we made the product ~10% worse).
These jobs and their failures are all caused by increased etcd leader elections disrupting seemingly unrelated test cases across the VSphere AMD64 platform.
Since this particular platform's business significance is high, I'm setting this as "Critical" severity.
Please get in touch with me or dwest@redhat.com if more teams need to be pulled into investigation and mitigation.
Version-Release number of selected component (if applicable):
4.15 / master
How reproducible:
Component Readiness Board
Actual results:
The etcd leader elections are elevated. Some jobs indicate it is due to disk i/o throughput OR network overload.
Expected results:
1. We NEED to understand what is causing this problem. 2. If we can mitigate this, we should. 3. If we cannot mitigate this, we need to document this or work with VSphere infrastructure provider to fix this problem. 4. We optionally need a way to measure how often this happens in our fleet so we can evaluate how bad it is.
Additional info:
- blocks
-
OCPBUGS-27151 [regression] increased etcd leader elections significantly impacting vsphere amd64 platform
- Closed
- depends on
-
OCPBUGS-27094 [regression] increased etcd leader elections significantly impacting vsphere amd64 platform
- Closed
- is cloned by
-
OCPBUGS-27151 [regression] increased etcd leader elections significantly impacting vsphere amd64 platform
- Closed
- relates to
-
TRT-1436 Investigate and tune disruption alerts
- Closed
-
TRT-1353 Investigate Sep 22 Azure OpenShift API Disruption Regression
- Closed
-
TRT-1370 Break etcd leadership intervals out of pod logs section in chart
- Closed
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update