-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Stabilize and Migrate WINC Tests to OTE
-
To Do
-
Quality / Stability / Reliability
-
100% To Do, 0% In Progress, 0% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
-
None
Background
The WINC test suite (49 tests in openshift-tests-private/test/extended/winc/) needs to be migrated to OpenShift Tests Extension (OTE) framework. However, OTE requires tests to maintain >=99% pass rate before they can be included in conformance suites.
Current State:
- 49 Ginkgo-based integration tests
- Tests currently do not meet 99% pass rate requirement
- PR openshift/release#71726 merged - adds more platform periodic CI coverage
- Already using compat_otp library (good foundation for OTE)
OTE Requirements:
Per the OTE Integration Guide, tests must pass at >=99% before inclusion in conformance suites.
Related Work:
- WINC-1508: Parallel execution and test optimization
- PR openshift/release#71726: New platform test coverage
Strategy
This epic is divided into TWO SEQUENTIAL PHASES:
Phase 1: Test Stabilization (MUST complete first)
- Monitor new platform test results from PR #71726
- Fix flaky tests and platform-specific issues
- Achieve >=99% pass rate across all platforms
- Validate sustained reliability for 2+ weeks
Estimated Duration: 10-12 weeks
Phase 2: OTE Migration (Blocked by Phase 1)
- Create OTE binary (cmd/winc-tests/main.go)
- Add required annotations ([OTP],[Jira:Windows_Containers])
- Define test suites (parallel, serial, storage)
- Register in origin's extension registry
- Validate in CI with maintained >=99% pass rate
Estimated Duration: 2-3 weeks
Goals
- Achieve and maintain >=99% test pass rate on all platforms
- Migrate to OTE framework
- Enable component team ownership
- Support automatic CI integration
- Improve test execution efficiency
Success Metrics
- All tests at >=99% pass rate for 14+ consecutive days
- Successful OTE migration with maintained reliability
- Tests run in >=3 CI job variants (AWS, Azure, vSphere)
- Zero regression in pass rate post-migration
- Tests properly categorized in TestGrid
Reference Documentation
Phase 1: Test Stabilization
Phase 1A: Monitor New Platform Test Results (Weeks 1-2)
- Identify new platforms/variants from PR #71726]
- Create TestGrid bookmarks and Sippy queries
- Collect baseline failure data over 2 weeks
- Create failure matrix (test × platform)
- Triage and categorize failures
Phase 1B: Fix Failures and Achieve 99% (Weeks 3-10)
- Fix test bugs and quick wins (timing, race conditions, retries)
- Fix platform-specific issues (AWS, Azure, vSphere, GCP, Nutanix, None)
- Address product bugs (file Jiras, coordinate with dev team)
- Monitor for 2 weeks sustained >=99% pass rate
- Document baseline metrics
Phase 2: OTE Migration
Infrastructure Setup
- Vendor github.com/openshift-eng/openshift-tests-extension
- Create cmd/winc-tests/main.go CLI binary
- Initialize extension and build test specs
- Register OTE subcommands
Test Compliance
- Add [OTP] tracking tags to all 49 tests
- Add [Jira:Windows_Containers] ownership tags
- Add [Level0] tags to conformance tests
- Verify test name stability
Suite Organization
- Define windows/all suite (all 49 tests)
- Define windows/conformance/parallel suite
- Define windows/conformance/serial suite
- Define windows/storage suite
- Add platform restrictions
Build & Distribution
- Configure Makefile for OTE binary build
- Update Dockerfile to include binary
- Generate bindata.go for test resources
CI Integration
- Register binary in origin's extension registry
- Verify automatic CI job inclusion
- Monitor TestGrid results
- Update ci-test-mapping
Documentation
- Update README with OTE usage
- Document suite structure
- Document platform requirements
Dependencies
- Phase 2 is BLOCKED by Phase 1 completion
- Related to WINC-1508 (parallel execution optimization)
Risks & Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
| New platform failures delay stabilization | High | Early monitoring, rapid triage |
| Product bugs block progress | High | Mark tests as informing if needed |
| Tests break during OTE migration | Medium | Thorough local testing first |
| CI jobs don't pick up new binary | Medium | Use multi-PR testing |
Total Estimated Effort
- Phase 1: 10-12 weeks
- Phase 2: 2-3 weeks
- Total: 12-15 weeks (3-4 months)