Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37485

PSI causing latency issues

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None
    • Disable PSI within the kernel since it is causing latency issues.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-37271. The following is the description of the original issue:

      Description of problem:

      In debugging recent cyclictest issues on OCP 4.16 (5.14.0-427.22.1.el9_4.x86_64+rt kernel), we have discovered that the "psi=1" kernel cmdline argument, which is now added by default due to cgroupsv2 being enabled, is causing latency issues (both cyclictest and timerlat are failing to meet the latency KPIs we commit to for Telco RAN DU deployments). See RHEL-42737 for reference.
      

      Version-Release number of selected component (if applicable):

      OCP 4.16

      How reproducible:

      Cyclictest and timerlat consistently fail on long duration runs (e.g. 12 hours).

      Steps to Reproduce:

          1. Install OCP 4.16 and configure with the Telco RAN DU reference configuration.
          2. Run a long duration cyclictest or timerlat test    

      Actual results:

      Maximum latencies are detected above 20us.

      Expected results:

      All latencies are below 20us.

      Additional info:

      See RHEL-42737 for test results and debugging information. This was originally suspected to be an RHEL issue, but it turns out that PSI is being enabled by OpenShift code (which adds psi=1 to the kernel cmdline).

            team-mco Team MCO
            openshift-crt-jira-prow OpenShift Prow Bot
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Aidan Reilly Aidan Reilly
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: