Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.20, 4.21
Component/s: Node Tuning Operator
Labels:
- telco

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None
Architecture:

All

Target Backport Versions:
None
Target Version:

4.22
Release Blocker:
None
Sprint:
CNF Compute Sprint 284
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Latency tests are described in our customer documentation here:
https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/scalability_and_performance/cnf-performing-platform-verification-latency-tests#cnf-measuring-latency_cnf-latency-tests

If the LATENCY_TEST_CPUS parameter is not supplied, the test is skipped with the following message:
[SKIPPED] Skip the test, the requested number of CPUs should be even to avoid noisy neighbor situation

This is due to the following check:
https://github.com/openshift/cluster-node-tuning-operator/blame/7916d0fc178a08ff83794f3f5fff9779885340c4/test/e2e/performanceprofile/functests/4_latency/latency.go#L84

I think the code is attempting to use all the available CPUs minus 1, which results in an odd number and the failure to run the test.

Note that the example in the documentation for the hwlatdetect test omits the LATENCY_TEST_CPUS parameter, so any customer following these instructions will hit this issue.

Version-Release number of selected component (if applicable):

Seen in OCP 4.20

How reproducible:

Always

Steps to Reproduce:

1. Run a hwlatdetect test following the instructions in the customer documentation referenced above

Actual results:

The test is skipped

Expected results:

The test runs

Additional info:

We'll need to decide on an appropriate default value for LATENCY_TEST_CPUS if it is not supplied. Using all the CPUs that are available is probably not a good idea as we do not recommend running latency tests on all or most of the CPUS in a server. The other option would be to change LATENCY_TEST_CPUS to be a mandatory parameter, but that is a bit unintuitive for the hwlatdetect test, which actually runs the test on ALL CPUs (using a kernel tracer), regardless of how many CPUs the cnf-tests container is using.

Also note that the workaround is just to provide the LATENCY_TEST_CPUS parameter to the command. Given that there is a simple workaround I would recommend that we only fix this in the current release.

Also note that this will likely require a customer documentation update if we change the default value for LATENCY_TEST_CPUS - the documentation says this:
LATENCY_TEST_CPUS: Specifies the number of CPUs that the pod running the latency tests uses. If you do not set the variable, the default configuration includes all isolated CPUs.

Assignee:: Shereen Haj

Reporter:: Bart Wensley

Need Info From:: None

Contributors:: None

QA Contact:: Niranjan Mallapadi Raghavendra Rao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2026/02/04 1:29 PM

Updated:: 2026/02/04 1:33 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates