-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Improve vSphere CI Job Stability and Reliability
-
Quality / Stability / Reliability
-
63% To Do, 0% In Progress, 38% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
-
None
-
11
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
Address the issues causing decreased pass rates in vSphere CI jobs, leading to improved stability and reliability of OpenShift on vSphere.
Why is this important?
vSphere is the platform with the largest number of OpenShift deployments. This requires we have a consistent signal related to the stability of vSphere on OpenShift. When there are CI outages, it impacts the ability of TRT to assess regressions.
Scenarios
- Implement Cache Pod Rollout on Cert Rotation
- Add Cache Health Monitoring and Alerting
- Enhance vSphere Capacity Manager with Port Group Conflict Alerting
- Improve CI Failure Clustering Visualization
- Decouple CI Job Dependency on Specific vSphere Capacity Manager Pools
- Implement Hardware Degradation Alerting
Acceptance Criteria
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>