-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
14
-
None
-
None
-
None
Story Summary
Enable BYOH E2E tests to run in Prow CI by implementing pre-provisioned node support and parallel test execution.
Current State: BYOH tests cannot run in Prow CI due to excessive runtime (~200 minutes) and resource constraints.
Target State: BYOH tests run in Prow CI in ~40 minutes using pre-provisioned infrastructure with partial parallelization.
Problem Statement
BYOH tests are currently blocked from Prow CI migration because:
Runtime Issues:
- Each test provisions Windows VMs for 15+ minutes via MachineSet creation
- 5 tests running serially = ~100 minutes total runtime
- Exceeds Prow job time limits and resource quotas
Infrastructure Issues:
- Tests are marked [Disruptive] requiring serial execution
- Node provisioning and destruction creates resource churn
- No mechanism to reuse nodes between test runs
Impact:
- BYOH tests remain in legacy Jenkins infrastructure
- No automated CI coverage for BYOH scenarios
- Slower feedback loop for Windows Container development
Solution Approach
Architecture Changes
Pre-Provisioned Infrastructure:
- Terraform provisions 2 Windows nodes before test execution
- Nodes are registered in windows-instances ConfigMap
- Tests detect and use existing nodes instead of provisioning new ones
Framework Enhancements:
- extractAddressesFromConfigMap() - Retrieve pre-provisioned node addresses
- Refactored setBYOH() - Dual-mode execution (pre-provisioned vs on-demand)
- remediateBYOHNodes() - Fast cleanup without node destruction
Execution Strategy:
- 1 test runs in parallel (OCP-42484)
- 3 tests run serially (OCP-42496, OCP-44099, OCP-82694)
- Total runtime: ~20 minutes
Implementation Details
Modified Components
Repository: openshift-tests-private
Primary File: test/extended/winc/utils.go
New Functions
// Extract addresses from pre-provisioned ConfigMap func extractAddressesFromConfigMap(oc *exutil.CLI) ([]string, error) // Clean nodes between test runs without deconfiguration func remediateBYOHNodes(oc *exutil.CLI, addresses []string, privateKey string, iaasPlatform string)
Refactored Functions
// Now supports dual-mode execution func setBYOH(oc *exutil.CLI, iaasPlatform string, addressesType []string, machinesetName string, winVersion string) []string { // Check if windows-instances ConfigMap exists if configMapExists() { // Pre-Provisioned Mode (Prow CI) addresses := extractAddressesFromConfigMap(oc) waitWindowsNodeReady(oc, nodeName, 11*time.Minute) return addresses } else { // Provisioning Mode (Local/Jenkins) configureMachineset(...) waitForMachinesetReady(...) return addresses } }
ConfigMap Structure
# Created by Terraform in Prow CI apiVersion: v1 kind: ConfigMap metadata: name: windows-instances namespace: openshift-windows-machine-config-operator data: "10.0.1.100": "username=Administrator" "10.0.1.101": "username=Administrator" "10.0.1.102": "username=Administrator"
Test Analysis
Affected Tests
| Test ID | Name | Parallel? | Runtime | Notes |
|---|---|---|---|---|
| OCP-42484 | BYOH Configure with IP | 5 min | Isolated namespace, parallel-safe | |
| OCP-42496 | BYOH Deconfiguration | 5 min | Deletes ConfigMap mid-test | |
| OCP-44099 | SSH Key Rotation | 5 min | Modifies cluster-wide secret | |
| OCP-82694 | Container Image Mirroring | 5 min | Creates cluster-wide IDMS | |
| OCP-42516 | BYOH IP+DNS Dual Addressing | N/A | N/A |
Test Execution Flow
Pre-Provisioned Mode (Prow CI):
Terraform: Provision 2 Windows nodes → 15 minutes (one-time)
↓
Parallel Phase:
- OCP-42484 (node 1) → 5 minutes
↓
Serial Phase:
- OCP-42496 (node 2) → 5 minutes
- OCP-44099 (node 2) → 5 minutes (after remediation)
- OCP-82694 (node 3) → 5 minutes
Total: ~35 minutes (15 min provision + 20 min tests)
Per-run: ~20 minutes (provisioning amortized across runs)
Provisioning Mode (Local/Jenkins):
Each test provisions its own node:
- OCP-42484: 15 min provision + 5 min test
- OCP-42496: 15 min provision + 5 min test
- OCP-44099: 15 min provision + 5 min test
- OCP-82694: 15 min provision + 5 min test
Total: ~80 minutes (sequential)
Acceptance Criteria
Framework Functionality
- [ ] extractAddressesFromConfigMap() retrieves all node addresses from ConfigMap
- [ ] setBYOH() detects pre-provisioned nodes when ConfigMap exists
- [ ] setBYOH() falls back to MachineSet provisioning when ConfigMap doesn't exist
- [ ] setBYOH() fails immediately with clear error if ConfigMap exists but is empty
- [ ] remediateBYOHNodes() completes cleanup in <1 minute
- [ ] ConfigMap is preserved in pre-provisioned mode (nodes stay configured)
- [ ] ConfigMap is deleted in provisioning mode (nodes get deconfigured)
Performance Targets
- [ ] Pre-provisioned node wait timeout: 11 minutes (reduced from 15)
- [ ] Total test suite runtime in Prow CI: ≤20 minutes
- [ ] Provisioning time per test in Prow: <1 minute
- [ ] Node remediation time: <1 minute
Parallel Execution
- [ ] OCP-42484 can run in parallel with other compatible tests
- [ ] Node assignment prevents conflicts between parallel tests
- [ ] Remediation is node-isolated (doesn't affect other tests)
Backward Compatibility
- [ ] All BYOH tests run successfully in local environments (provisioning mode)
- [ ] No changes required to existing test code in initial implementation
- [ ] Clear logging indicates execution mode (Pre-Provisioned vs Provisioning)
Prow CI Integration
- [ ] BYOH tests execute successfully in Prow CI environment
- [ ] Tests use pre-provisioned infrastructure correctly
- [ ] Runtime is within Prow CI acceptable limits
- [ ] Full Windows Containers test suite migrated to Prow CI
Dependencies
Infrastructure
Terraform Provisioner:
- PR #6
- Provisions Windows nodes before test execution
- Creates windows-instances ConfigMap with node addresses
Prow Configuration:
- PR #71002
- Defines Prow job for BYOH tests
- Integrates Terraform provisioning step
Parent Epic
- WINC-1473 - BYOH Pre-Provisioned Workflow
Technical Notes
Detection Mechanism:
- ConfigMap-based only (no environment variables)
- Presence of windows-instances ConfigMap = pre-provisioned mode
- Absence of ConfigMap = provisioning mode
Timeout Adjustments:
- Pre-provisioned nodes: 11 minutes (nodes already exist)
- Provisioned nodes: 15 minutes (includes VM creation)
Critical Behavior:
- Deleting windows-instances ConfigMap triggers node deconfiguration
- In pre-provisioned mode, ConfigMap must be preserved for node reuse
- In provisioning mode, ConfigMap deletion is part of cleanup
Parallelization Limitations:
- 3 tests cannot run in parallel due to cluster-wide resource modifications
- Future work: Refactor these tests to enable full parallelization
Out of Scope
This Story Does NOT Include:
- Test code modifications (tests remain unchanged initially)
- Creating new Prow job definitions (separate infrastructure story)
- Refactoring non-parallel-safe tests (future enhancement)
- Deprecating OCP-42516 (separate cleanup story)
Performance Impact
| Metric | Before (Serial) | After (Prow CI) | Improvement |
|---|---|---|---|
| Provisioning per Test | 15 min | <1 min | 93% reduction |
| Total Runtime | ~100 min | ~20 min | 80% reduction |
| Parallel Tests | 0/5 (0%) | 1/4 (25%) | Partial parallelization |
| Node Wait Timeout | 15 min | 11 min | 4 min faster |
| Cleanup Time | 2 min | <1 min | 50% faster |
| Prow CI Ready | Migration unblocked |
Testing Strategy
Local Verification
- [ ] Run all BYOH tests locally (provisioning mode)
- [ ] Verify backward compatibility
- [ ] Confirm no regressions in existing functionality
Prow CI Verification
- [ ] Simulate pre-provisioned environment locally
- [ ] Verify ConfigMap detection works correctly
- [ ] Confirm node remediation leaves nodes in clean state
- [ ] Test parallel execution of OCP-42484
- [ ] Validate serial execution of remaining tests
Integration Testing
- [ ] End-to-end Prow CI job execution
- [ ] Terraform provisioning → test execution → cleanup
- [ ] Multi-run validation (verify node reuse works)
Definition of Done
- [ ] Code changes merged to openshift-tests-private
- [ ] All acceptance criteria met
- [ ] BYOH tests running successfully in Prow CI
- [ ] Runtime within acceptable limits (~20 minutes)
- [ ] Documentation updated (inline code comments)
- [ ] Local execution still works (backward compatible)
- [ ] Full Windows Containers test suite migrated to Prow CI
Follow-up Work
Future Enhancements:
- Refactor OCP-42496, OCP-44099, OCP-82694 for parallel execution
- Evaluate deprecation of OCP-42516 (redundant test)
- Optimize remediation for <30 second cleanup
- Enable full 100% parallelization (all 4 tests concurrent)
- blocks
-
WINC-1537 Consolidate WINC test templates using Go text/template to reduce file count and improve maintainability
-
- To Do
-
- links to