Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42495

cyclictest shows >20us latency on Dell XR5610 running OCP 4.14

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • Telco Performance
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      2024/11/18: Continuing to make progress through scratch kernels and additional tracing.
      2024/11/01: Making good progress on getting to the root cause through tracing, but taking some time as scratch kernels are required to add additional tracing.
      Show
      2024/11/18: Continuing to make progress through scratch kernels and additional tracing. 2024/11/01: Making good progress on getting to the root cause through tracing, but taking some time as scratch kernels are required to add additional tracing.

      Description of problem:

      When running cyclictest over a 12h period on a Dell XR5610 server (SPR-EE) running OCP 4.14.36, we get consistently good results once the DU profile is applied but the Intel FEC operator is not deployed. However, as soon as the Intel FEC operator is configured, we get latency spikes above 30 us.

      Depending on the run, it can be a single spike or several (4-5), usually on the same CPU.

      Version-Release number of selected component (if applicable):
      OCP 4.14.36 (seen on earlier .z releases, too)
      Intel FEC sriov-fec.v2.9.0
      Dell XR5610 server with
      . BIOS 2.0.4
      . iDRAC firmware 7.10.50.10

      How reproducible: Always

      Steps to Reproduce:

      1. Deploy XR5610 with DU profile
      2. Install Intel FEC operator, configure SriovFecClusterConfig resource
      3. Run cyclictest

      Actual results: Max latency is up to 32 us

      Expected results:

      No latency values above 20 us, as seen when the operator is not deployed:

       # Max Latencies: 00010 00012 00010 00010 00010 00011 00011 00009 00011 00007

      Additional info:
      With the same configuration, we have run an rtla timerlat test and captured a latency spike > 32 us. I'm attaching the timerlat output as well.

              tglozar Tomas Glozar
              jpena@redhat.com Javier Pena
              Yang Liu Yang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated: