-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
RHDH CI Optimization
-
False
-
-
False
-
To Do
-
QE Needed, Docs Needed, TE Needed, Customer Facing, PX Needed
-
88% To Do, 13% In Progress, 0% Done
-
-
EPIC Goal
Reduce RHDH PR presubmit CI feedback time from ~31 minutes to <15 minutes (P50) by optimizing the existing OCP Helm deploy pipeline. PR checks continue using full OCP deployment with RHDH build — all optimizations target pipeline efficiency.
Background
Evidence from Prow run Build ID 2023822693903634432 (job: pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm):
| Phase | Duration | % of Total |
|---|---|---|
| Pre | 25s | 1% |
| Test (deploy + Playwright) | 20m 06s | 65% |
| Post (gather-extra + must-gather) | 8m 26s | 27% |
| Playwright only (showcase + showcase-rbac) | ~10m | 32% |
| Total | ~30m 59s | 100% |
Key findings:
- 8m 26s on post-phase artifact collection even on success (gather-extra: 6m, must-gather: 1m43s)
- ~5 min waiting for Backstage readiness after Helm deploy (HTTP 503 loop)
- ~1-2 min installing OpenShift Pipelines operator that is only needed for Tekton nightly tests, not PR
- ~10 min running Playwright suites sequentially (showcase + showcase-rbac in separate namespaces)
Why is this important?
- Developer velocity: 31-min feedback loops slow iteration
- CI resource waste: cluster time, compute, and operator installs repeated unnecessarily
- Flake impact: no structured quarantine process, failures waste additional re-run time
- Test pyramid is inverted: ~80% E2E, ~15% integration, ~5% unit — no coverage metrics
Target Outcomes
| Metric | Current | Target |
|---|---|---|
| PR feedback P50 | ~31 min | <15 min |
| PR feedback P90 | ~45 min | <20 min |
| Post-phase (pass) | 8m 26s | <1 min |
| Nightly pass rate | TBD | >90% |
| Flake rate | TBD | <5% |
Approach
Three phases:
- Quick Wins (Weeks 1-4): Conditional gather on failure only, skip Pipelines operator for PR, Playwright parallelism tuning, flake quarantine mechanism
- Medium-term (Weeks 5-10): Pre-warmed cluster pools with operators pre-installed, parallel deployment and testing of showcase + showcase-rbac
- Strategic (Weeks 11-20): Optional test impact selection, coverage pipeline with ReportPortal/Codecov integration
Acceptance Criteria
PR feedback time reduced to <15 min at P50
Post-phase on success completes in <1 min
Flake quarantine mechanism operational
Pre-warmed cluster pool operational for at least one pool
Coverage pipeline integrated with ReportPortal
References
- Prow run: Build ID 2023822693903634432
- Prow build log: artifacts/e2e-ocp-helm/build-log.txt
- Prow gather-extra log: artifacts/e2e-ocp-helm/gather-extra/build-log.txt
- Coverage Metrics Jira: RHDHPLAN-851