-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.16
-
Important
-
None
-
False
-
-
-
Bug Fix
-
Done
-
Description of problem:
In debugging recent cyclictest issues on OCP 4.16 (5.14.0-427.22.1.el9_4.x86_64+rt kernel), we have discovered that the "psi=1" kernel cmdline argument, which is now added by default due to cgroupsv2 being enabled, is causing latency issues (both cyclictest and timerlat are failing to meet the latency KPIs we commit to for Telco RAN DU deployments). See RHEL-42737 for reference.
Version-Release number of selected component (if applicable):
OCP 4.16
How reproducible:
Cyclictest and timerlat consistently fail on long duration runs (e.g. 12 hours).
Steps to Reproduce:
1. Install OCP 4.16 and configure with the Telco RAN DU reference configuration. 2. Run a long duration cyclictest or timerlat test
Actual results:
Maximum latencies are detected above 20us.
Expected results:
All latencies are below 20us.
Additional info:
See RHEL-42737 for test results and debugging information. This was originally suspected to be an RHEL issue, but it turns out that PSI is being enabled by OpenShift code (which adds psi=1 to the kernel cmdline).
- blocks
-
OCPBUGS-37485 PSI causing latency issues
- Closed
- is cloned by
-
OCPBUGS-37485 PSI causing latency issues
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update