Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.19
Component/s: Node / Kubelet
Labels:
- CPT
- SDN:Scale
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

OCP 4.19 cluster-density-v2 podReadyLatencies regressed significantly.

99th-percentile latencies were stable at 11s, and are now consistently above 15s, sometimes reaching ___

Max latencies sometimes fluctuate, but were also stable. Now sometimes reach 40s.

The stats are:
Before this change, 99th latency measurements were 11.064s +/- 0.235s , max: 12.676s +/ 2.056s
After the change, 99th is 15.650s +/- 0.489s , max is 20.591s +/- 6.609s

Version-Release number of selected component (if applicable):

There are multiple change points between several versions.
Hunter change point detection algorithm picks out
* the max latency change was introduced between 4.19.0-0.nightly-2025-01-27-130640 and 4.19.0-0.nightly-2025-01-28-090833
* 99th percentile latency change was introduced between: 4.19.0-0.nightly-2025-01-28-090833 and 4.19.0-0.nightly-2025-01-30-091858.

How reproducible:

100% . These values have been consistently high and unstable to today (Feb 10).

Steps to Reproduce:

1. Run the payload control plane test in prow:`/pj-rehearse periodic-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.19-nightly-x86-payload-control-plane-6nodes` (or observe the current job history triggered on each nightly build)

Actual results:

Observe cluster-density-v2 p99 is >=15s and max is between 17-40s

Expected results:

cluster-density-v2 p99 is 11s and max is 12s

Additional info:

Our expectation of 11s 99th percentile is not a sensitive threshold. The increase from 11 to 15 indicates significant reduction in throughput throughout the platform, akin to a higher workload density or higher client QPS scaling rate.

We should handle this as a perceptible difference in the user experience and cluster stability.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

podLateny-P99-trend.png
83 kB
2025/06/02 1:50 PM

is related to

CNV-63354 [Scale] Fusion Access Operator Storage Regression for CNV

Closed

Assignee:: Peter Hunt

Reporter:: Andrew Collins

Need Info From:: None

Contributors:: None

QA Contact:: Andrew Collins

Doc Contact:: James Brigman

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2025/02/10 7:35 PM

Updated:: 2025/07/16 1:20 PM

Resolved:: 2025/06/09 3:29 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide