-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.16
-
Important
-
None
-
False
-
-
Disable PSI within the kernel since it is causing latency issues.
-
Bug Fix
-
In Progress
-
This is a clone of issue OCPBUGS-37271. The following is the description of the original issue:
—
Description of problem:
In debugging recent cyclictest issues on OCP 4.16 (5.14.0-427.22.1.el9_4.x86_64+rt kernel), we have discovered that the "psi=1" kernel cmdline argument, which is now added by default due to cgroupsv2 being enabled, is causing latency issues (both cyclictest and timerlat are failing to meet the latency KPIs we commit to for Telco RAN DU deployments). See RHEL-42737 for reference.
Version-Release number of selected component (if applicable):
OCP 4.16
How reproducible:
Cyclictest and timerlat consistently fail on long duration runs (e.g. 12 hours).
Steps to Reproduce:
1. Install OCP 4.16 and configure with the Telco RAN DU reference configuration. 2. Run a long duration cyclictest or timerlat test
Actual results:
Maximum latencies are detected above 20us.
Expected results:
All latencies are below 20us.
Additional info:
See RHEL-42737 for test results and debugging information. This was originally suspected to be an RHEL issue, but it turns out that PSI is being enabled by OpenShift code (which adds psi=1 to the kernel cmdline).
- clones
-
OCPBUGS-37271 PSI causing latency issues
- Closed
- is blocked by
-
OCPBUGS-37271 PSI causing latency issues
- Closed
- links to
-
RHSA-2024:5107 OpenShift Container Platform 4.16.z security update