Loading...

XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- phase-1-stabilization
- qe

Epic Name:
Fix flaky tests and platform-specific issues
Epic Status:
In Progress
Activity Type:
Quality / Stability / Reliability
Hierarchy Progress Bar:

0% To Do, 5% In Progress, 95% Done
Blocked:
False
Blocked Reason:
Hide

User Story

As a test engineer, I want to fix all identified flaky tests and platform-specific issues so that the test suite achieves stable execution across all platforms.

Description

Based on the monitoring data from Story 1, this story focuses on systematically fixing all tests that fail to meet the 99% pass rate threshold. Work includes root cause analysis, code fixes for race
conditions, timeout adjustments, resource cleanup improvements, and platform-specific configuration corrections.

This story is the core stabilization work in Phase 1 and is expected to be the most time-intensive. Fixes should be validated on all affected platforms before marking as complete.

Required

Root cause analysis completed for all failing tests identified in Story 1

Code fixes implemented for race conditions and timing issues

Platform-specific configurations corrected (timeouts, resource limits, etc.)

Resource cleanup issues resolved

All fixes validated on respective platforms with improved pass rates

Nice to have

Test code refactoring for improved maintainability

Additional logging/diagnostics for future debugging

Documentation of common failure patterns and solutions

Engineering Details

Based on findings from Story 1 monitoring

Expected issues: race conditions, timing dependencies, resource cleanup, platform-specific timeouts

Test framework: Ginkgo

Platforms: AWS, Azure, vSphere, GCP, Nutanix

Repository: openshift-tests-private

Each fix should be tested on all platforms where the issue was observed

Acceptance Criteria

Root cause analysis documented for every test with pass rate <99%

All race conditions and timing issues are fixed

Platform-specific timeout and resource configurations are optimized

Resource cleanup issues are resolved (no leaked resources)

All fixes are verified on the platforms where failures occurred

Test pass rates show measurable improvement (tracking toward 99%)
Show
User Story As a test engineer, I want to fix all identified flaky tests and platform-specific issues so that the test suite achieves stable execution across all platforms. Description Based on the monitoring data from Story 1, this story focuses on systematically fixing all tests that fail to meet the 99% pass rate threshold. Work includes root cause analysis, code fixes for race conditions, timeout adjustments, resource cleanup improvements, and platform-specific configuration corrections. This story is the core stabilization work in Phase 1 and is expected to be the most time-intensive. Fixes should be validated on all affected platforms before marking as complete. Required Root cause analysis completed for all failing tests identified in Story 1 Code fixes implemented for race conditions and timing issues Platform-specific configurations corrected (timeouts, resource limits, etc.) Resource cleanup issues resolved All fixes validated on respective platforms with improved pass rates Nice to have Test code refactoring for improved maintainability Additional logging/diagnostics for future debugging Documentation of common failure patterns and solutions Engineering Details Based on findings from Story 1 monitoring Expected issues: race conditions, timing dependencies, resource cleanup, platform-specific timeouts Test framework: Ginkgo Platforms: AWS, Azure, vSphere, GCP, Nutanix Repository: openshift-tests-private Each fix should be tested on all platforms where the issue was observed Acceptance Criteria Root cause analysis documented for every test with pass rate <99% All race conditions and timing issues are fixed Platform-specific timeout and resource configurations are optimized Resource cleanup issues are resolved (no leaked resources) All fixes are verified on the platforms where failures occurred Test pass rates show measurable improvement (tracking toward 99%)
Ready:
False
Color Status:
Not Selected
Size:
None

Target Version:
None
Release Blocker:
None

User Story

As a test engineer, I want to fix all identified flaky tests and platform-specific issues so that the test suite achieves stable execution across all platforms.

Description

Based on the monitoring data from Story 1, this story focuses on systematically fixing all tests that fail to meet the 99% pass rate threshold. Work includes root cause analysis, code fixes for race conditions, timeout adjustments, resource cleanup improvements, and platform-specific configuration corrections.

This story is the core stabilization work in Phase 1 and is expected to be the most time-intensive. Fixes should be validated on all affected platforms before marking as complete.

Required

Root cause analysis completed for all failing tests identified in Story 1
Code fixes implemented for race conditions and timing issues
Platform-specific configurations corrected (timeouts, resource limits, etc.)
Resource cleanup issues resolved
All fixes validated on respective platforms with improved pass rates

Nice to have

Test code refactoring for improved maintainability
Additional logging/diagnostics for future debugging
Documentation of common failure patterns and solutions

Acceptance Criteria

Root cause analysis documented for every test with pass rate <99%
All race conditions and timing issues are fixed
Platform-specific timeout and resource configurations are optimized
Resource cleanup issues are resolved (no leaked resources)
All fixes are verified on the platforms where failures occurred
Test pass rates show measurable improvement (tracking toward 99%)

clones

WINC-1551 Monitor new platform test results from PR #71726

In Progress

is cloned by

WINC-1553 Achieve >=99% pass rate across all platforms

To Do

is related to

WINC-1697 Fix curl commands on Windows pods - use Invoke-WebRequest

To Do

WINC-1709 Fix OCP-60944: Add Flexy bastion discovery and SSH validation for platform 'none'

To Do

WINC-1696 Fix OCP-42204: Windows pod with Projected Volume fails in Prow CI

In Progress

links to

openshift/openshift-tests-private#28624: Fix OCP-33612: Update payload file format expectations

openshift/openshift-tests-private#28625: Fix hybrid-overlay-node service race condition on Prow CI (OCP-84267, OCP-76765, OCP-74760, OCP-54711)

openshift/openshift-tests-private#28626: Fix OCP-77777: Add time parameter to waitUntilWMCOStatusChanged

openshift/openshift-tests-private#28635: Fix PowerShell command quoting in Windows service functions (WINC-1561) OCP-60944 fix

openshift/openshift-tests-private#29146: WINC-1660: Fix winc template file path resolution for Prow CI

openshift/openshift-tests-private#29206: Fix curl commands on Windows pods - use Invoke-WebRequest

openshift/openshift-tests-private#29256: Add bastion helper scripts for WINC testing

(7 links to)

Assignee:: Unassigned

Reporter:: Weinan Liu

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2026/01/08 11:03 AM

Updated:: 2026/03/04 6:30 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates