Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37485

PSI causing latency issues

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None
    • Disable PSI within the kernel since it is causing latency issues.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-37271. The following is the description of the original issue:

      Description of problem:

      In debugging recent cyclictest issues on OCP 4.16 (5.14.0-427.22.1.el9_4.x86_64+rt kernel), we have discovered that the "psi=1" kernel cmdline argument, which is now added by default due to cgroupsv2 being enabled, is causing latency issues (both cyclictest and timerlat are failing to meet the latency KPIs we commit to for Telco RAN DU deployments). See RHEL-42737 for reference.
      

      Version-Release number of selected component (if applicable):

      OCP 4.16

      How reproducible:

      Cyclictest and timerlat consistently fail on long duration runs (e.g. 12 hours).

      Steps to Reproduce:

          1. Install OCP 4.16 and configure with the Telco RAN DU reference configuration.
          2. Run a long duration cyclictest or timerlat test    

      Actual results:

      Maximum latencies are detected above 20us.

      Expected results:

      All latencies are below 20us.

      Additional info:

      See RHEL-42737 for test results and debugging information. This was originally suspected to be an RHEL issue, but it turns out that PSI is being enabled by OpenShift code (which adds psi=1 to the kernel cmdline).

              team-mco Team MCO
              openshift-crt-jira-prow OpenShift Prow Bot
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Aidan Reilly Aidan Reilly
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: