Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.20
Component/s: Cloud Compute / vSphere Provider
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    We noticed an increase in the 95 percentile disruptions for pod-to-host and host-to-host backends for vsphere platform. Further look shows that the disruption comes from a few jobs testing host groups feature. 

Here is a dashboard showing the trend:

https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?var-platform=vsphere&var-percentile=P95&var-backend=host-to-host-new-connections&var-releases=4.20&var-upgrade_type=none&var-networks=ovn&var-topologies=ha&var-architectures=amd64&var-lookback=3&var-master_nodes_updated=N&var-min_disruption_regression=-10&var-min_disruption_job_list=0&var-min_relevance=0&var-featureset=techpreview&orgId=1

We can see a few jobs having disruptions. Here is one example job:

periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-host-groups-ovn-techpreview #1965131620608380928

We started a slack thread here: https://redhat-internal.slack.com/archives/C015H2WDJRY/p1757438092386989

In the thread, we learned that host groups is a layered deployment that low performance is expected. It was suggested disruption test should be disabled in this setup.

This card is created to keep track of this issue. From whom should we get permission for disabling disruption test in this case? 

It is worth noting that most of the jobs do not show the same disruption.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Assignee:: Joseph Callen

Reporter:: Ken Zhang

QA Contact:: Shang Gao

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/09/10 5:43 PM

Updated:: 2025/09/17 3:05 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates