Loading...

XML

Word

Printable

Type: Task
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: CNV v4.19.z, CNV v4.20.z, CNV v4.21.z
Component/s: CNV QE DevOps
Labels:
- check-relevancy

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
None
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Problem

The test job test-kubevirt-cnv-4.21-operator-ocs is experiencing flakiness due to virt-handler rollout taking ~85 seconds on 3 workers, which is close to the 120-second timeout threshold.

Test run: https://jenkins-csb-cnvqe-main.dno.corp.redhat.com/job/test-kubevirt-cnv-4.21-operator-ocs/test_results_analyzer/

Note it affects at least 3 versions: 4.19+

Context

The job runs as part of the kubevirt-t1 scheduled job
It shares a single cluster with all kubevirt-t1 lanes
The tests are explicitly written against an upstream environment that has a single worker
Simply increasing the timeout doesn't solve the root cause - it just pushes the problem to the next assertion that depends on worker/virt-handler operations completing within a certain time

Options Discussed

Option 1: Reduce worker count

Pros: Matches upstream test environment (single worker)
Cons: Requires running the test job separately from t1-scheduled (not sharing cluster with other lanes)
Status: Preferred by test developers (dsionov, lyarwood)

Option 2: Increase timeout

Pros: Minimal changes required
Cons: Doesn't address root cause - tests written for single worker environment
Status: Not recommended by test developers

Option 3: Mark nodes during operator lane setup

Description: Make only a single node schedulable, mark others as unschedulable
Pros: Could simplify implementation while keeping job in t1-scheduled
Status: Needs investigation

Action Items

[ ] Investigate feasibility of Option 3 (marking nodes as unschedulable)
[ ] Determine where to run this test job if Option 1 is chosen (separate from t1-scheduled)
[ ] Implement chosen solution
[ ] Verify test stability after fix

Stakeholders

Reporter: lbednar
Test Developers: dsionov, lyarwood

Assignee:: Daniel Keler

Reporter:: Lukas Bednar

Contributors:: Daniel Sionov, Lee Yarwood

QA Contact:: Daniel Keler

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2026/01/12 11:39 AM

Updated:: 2026/01/28 8:01 AM

Details

Description

Problem

Context

Options Discussed

Option 1: Reduce worker count

Option 2: Increase timeout

Option 3: Mark nodes during operator lane setup

Action Items

Stakeholders

Attachments

Easy Agile Planning Poker

Activity

People

Dates