Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-92885

RHEL-10 Network Performance degradation up to 15% with small message sizes

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • kernel / Networking
    • None
    • No
    • None
    • rhel-net-core
    • ssg_networking
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Hello everyone 

       

      We are experiencing up to 15% network performance degradation, which can be observed mainly with small and medium tcp message sizes.

      The most considerable iperf3 performance degradation on Emerald Rapids has occurred between kernels 6.3 and 6.4:

      5.14.0-427.13.1.el9_4: 189 Mbits/sec

      6.2.0-63.eln126: 186 Mbits/sec

      6.3.0-63.eln126: 183 Mbits/sec
      6.4.0-59.eln127: 171 Mbits/sec

      git log -S copy_user_short_string v6.3..v6.4

      shows these two commits from Linus:

      commit 427fda2c8a4977d9dbd9bc108bbe6e21ec84648d
          x86: improve on the non-rep 'copy_user' function

      commit adfcf4231b8cbc2d9c1e7bfaa965b907e60639eb
          x86: don't use REP_GOOD or ERMS for user memory copies

      I think the second commit (adfcf4231b8cbc2d9c1e7bfaa965b907e60639eb) might be the one you also found in your analysis. 

      I am wondering if "rep movsb" is actually slower than the manual copy for the small buffer sizes, at least on Emerald Rapids.  I'll build a kernel to test that theory (along with the other fixes).

       

      Problem is visible only with selinux=enforcing  spectre_bhi=on and mitigations enabled. With disabled selinux or spectre_bhi=off performance is equal both on rhel-9.5 and rhel-10

       

      Please note that this Jira is spin-off original jira

      https://issues.redhat.com/browse/RHEL-40027

      which happened to be mix up of two diferent performance affecting issues and also become long and bloated.

       

      Feel free to drop a comment if you are in doubt. 

      Thanks

      Adam

        1. 2024-Oct-09-emerald-rapids-results.tar.xz
          1.30 MB
        2. diff_selinux_off.html
          4.07 MB
        3. diff_selinux_permissive.html
          4.13 MB
        4. recvr_efficiecny.jpg
          recvr_efficiecny.jpg
          78 kB
        5. remote cpu.jpg
          remote cpu.jpg
          69 kB
        6. sender_efficiency.jpg
          sender_efficiency.jpg
          78 kB
        7. throughput.jpg
          throughput.jpg
          85 kB

              nst-kernel-bugs nst-kernel-bugs
              aokuliar Adam Okuliar
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: