Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6135

[RHEL9.2] libvma test fails consistently on "sockperf pingpong multicast pkey/vlan" when tested on MLX4 IB0 device

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Yes
    • None
    • rhel-net-drivers
    • ssg_networking
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      "sockperf pingpong multicast pkey/vlan" test case consistently fails when libvma is tested on MLX4 IB0 devices.

      vma test results on rdma-dev-01/rdma-dev-10 & Beaker job J:7297390:
      5.14.0-202.el9.x86_64, rdma-core-41.0-3.el9, mlx4, ib0, ConnectX-3 & mlx4_0
      Result | Status | Test
      -------------------------------------------------
      PASS | 0 | sockperf pingpong multicast
      PASS | 0 | sockperf throughput multicast
      PASS | 0 | sockperf throughput unicast
      PASS | 0 | sockperf pingpong unicast
      PASS | 0 | sockperf (100 sockets) pingpong multicast
      PASS | 0 | sockperf (100 sockets) pingpong unicast
      FAIL | 1 | sockperf pingpong multicast pkey/vlan
      PASS | 0 | sockperf pingpong unicast pkey/vlan
      Checking for failures and known issues:
      sockperf pingpong multicast pkey/vlan is NOT a known issue on any environment - consider filing a BZ

      +++++++++++++++++++++++++++++++++

      This is a regression where, with RHEL-9.1.0, the same test passes.

      (regression suite over rdma-ib-mlx4 pool[ RHEL-9.1.0: rdma-dev-01/10 - VMA only ]
      )
      vma test results on rdma-dev-01/rdma-dev-10 & Beaker job J:7297395:
      5.14.0-162.6.1.el9_1.x86_64, rdma-core-41.0-3.el9, mlx4, ib0, ConnectX-3 & mlx4_0
      Result | Status | Test
      -------------------------------------------------
      PASS | 0 | sockperf pingpong multicast
      PASS | 0 | sockperf throughput multicast
      PASS | 0 | sockperf throughput unicast
      PASS | 0 | sockperf pingpong unicast
      PASS | 0 | sockperf (100 sockets) pingpong multicast
      PASS | 0 | sockperf (100 sockets) pingpong unicast
      PASS | 0 | sockperf pingpong multicast pkey/vlan
      PASS | 0 | sockperf pingpong unicast pkey/vlan
      Checking for failures and known issues:
      no test failures

      The same results observed in another MLX4 IB0 host pair : rdma-virt-00/01

      Version-Release number of selected component (if applicable):

      Clients: rdma-dev-10
      Servers: rdma-dev-01

      DISTRO=RHEL-9.2.0-20221129.2

      + [22-12-01 23:01:51] cat /etc/redhat-release
      Red Hat Enterprise Linux release 9.2 Beta (Plow)

      + [22-12-01 23:01:51] uname -a
      Linux rdma-dev-10.rdma.lab.eng.rdu2.redhat.com 5.14.0-202.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 28 08:49:47 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

      + [22-12-01 23:01:51] cat /proc/cmdline
      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-202.el9.x86_64 root=UUID=152a7d1f-a161-442e-b5e2-0317b93bd26b ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=6e8fa5c6-5310-4c1f-8b11-8d4c8335d977 console=ttyS1,115200n81

      + [22-12-01 23:01:51] rpm -q rdma-core linux-firmware
      rdma-core-41.0-3.el9.x86_64
      linux-firmware-20221012-128.el9.noarch

      + [22-12-01 23:01:51] tail /sys/class/infiniband/mlx4_0/fw_ver
      2.42.5000

      + [22-12-01 23:01:51] lspci
      + [22-12-01 23:01:51] grep -i -e ethernet -e infiniband -e omni -e ConnectX
      03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe (rev 10)
      03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe (rev 10)
      05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

      + [22-12-01 23:01:51] rpm -q libvma
      libvma-9.6.4-1.el9.x86_64

      How reproducible:
      100%

      Steps to Reproduce:

      + [22-12-01 23:07:31] export SERVER_IP1=172.31.0.101
      + [22-12-01 23:07:31] SERVER_IP1=172.31.0.101
      ++ [22-12-01 23:07:31] cat /etc/hosts
      ++ [22-12-01 23:07:31] grep ib0.2-
      ++ [22-12-01 23:07:31] awk -F ' ' '

      {print $1}

      '
      +++ [22-12-01 23:07:31] RQA_hn_tail rdma-dev-01
      +++ [22-12-01 23:07:31] echo rdma-dev-01
      +++ [22-12-01 23:07:31] sed e 's/\..*//' -e s/rdma//
      ++ [22-12-01 23:07:31] grep dev-01
      + [22-12-01 23:07:31] export SERVER_IP2=172.31.2.101
      + [22-12-01 23:07:31] SERVER_IP2=172.31.2.101

      1. Get both RDMA server & client hosts ready for libvma test with current sockperf, with RHEL-9.2 build as specified above on MLX4 IB0 devices
      2. On server host, issue the following commands

      LD_PRELOAD=libvma.so
      timeout --preserve-status --kill-after=5m 3m sockperf server -i 172.31.2.101

      3. On the client hosts, issue the following command

      LD_PRELOAD=libvma.so
      timeout --preserve-status --kill-after=5m 3m sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472

      Actual results:

      Server host:

      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:11:02] LD_PRELOAD=libvma.so
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:11:02] timeout --preserve-status --kill-after=5m 3m sockperf server -i 172.31.2.101
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: timeout --preserve-status --kill-after=5m 3m sockperf server -i 172.31.2.101
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: sockperf server -i 172.31.2.101
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0msockperf: [2;35m== version #3.10-0.git5ebd327da983 == [0m
      sockperf: ERROR: [fd=27] Can`t bind socket, IP to bind: 172.31.2.101:11111
      (errno=99 Cannot assign requested address)
      sockperf: cleanupAfterLoop() exit

      Client host:

      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:11:12] LD_PRELOAD=libvma.so
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:11:12] timeout --preserve-status --kill-after=5m 3m sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: timeout --preserve-status --kill-after=5m 3m sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0msockperf: [2;35m== version #3.10-0.git5ebd327da983 == [0m
      sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

      [ 0] IP = 172.31.2.101 PORT = 11111 # UDP
      sockperf: Warmup stage (sending a few dummy messages)...
      sockperf: Starting test...
      sockperf: Test end (interrupted by timer)
      sockperf: Test ended
      sockperf: No messages were received from the server. Is the server down?
      + [22-12-01 23:11:25] result=0
      + [22-12-01 23:11:25] '[' 0 -ne 0 ']'
      + [22-12-01 23:11:25] grep -qi -e ' error ' -e 'no messages were received' /tmp/vma.txt
      + [22-12-01 23:11:25] return 1

      Expected results: On RHEL-9.1.0, the above same testcase result look like the following:

      Server host:

      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:49:36] LD_PRELOAD=libvma.so
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:49:36] timeout --preserve-status --kill-after=5m 3m sockperf server -i 172.31.2.101
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: timeout --preserve-status --kill-after=5m 3m sockperf server -i 172.31.2.101
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: sockperf server -i 172.31.2.101
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0m[2;35m VMA WARNING: * IPoIB mode of interface 'mlx4_ib0.8010' is "connected" !
      [0m[2;35m VMA WARNING: * Please change it to datagram: "echo datagram > /sys/class/net/mlx4_ib0.8010/mode" before loading your application with VMA library
      [0m[2;35m VMA WARNING: * VMA doesn't support IPoIB in connected mode.
      [0m[2;35m VMA WARNING: * Please refer to VMA Release Notes for more information
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0m[2;35m VMA WARNING: * IPoIB mode of interface 'mlx4_ib0.8012' is "connected" !
      [0m[2;35m VMA WARNING: * Please change it to datagram: "echo datagram > /sys/class/net/mlx4_ib0.8012/mode" before loading your application with VMA library
      [0m[2;35m VMA WARNING: * VMA doesn't support IPoIB in connected mode.
      [0m[2;35m VMA WARNING: * Please refer to VMA Release Notes for more information
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0msockperf: [2;35m== version #3.10-0.git5ebd327da983 == [0m
      sockperf: [SERVER] listen on:
      [ 0] IP = 172.31.2.101 PORT = 11111 # UDP
      sockperf: Warmup stage (sending a few dummy messages)...
      sockperf: [tid 58899] using recvfrom() to block on socket(s)
      Terminated

      Client host:

      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:49:46] LD_PRELOAD=libvma.so
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: date ++ [%y-%m-%d %H:%M:%S]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m+ [22-12-01 23:49:46] timeout --preserve-status --kill-after=5m 3m sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472
      [0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: timeout --preserve-status --kill-after=5m 3m sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: VMA_VERSION: 9.6.4-1 Release built on Aug 3 2022 00:00:00
      [0m[0m VMA INFO: Cmd Line: sockperf pp -i 172.31.2.101 -t 10 --msg-size=1472
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[0m VMA INFO: Log Level INFO [VMA_TRACELEVEL]
      [0m[0m VMA INFO: ---------------------------------------------------------------------------
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0m[2;35m VMA WARNING: * IPoIB mode of interface 'mlx4_ib0.8010' is "connected" !
      [0m[2;35m VMA WARNING: * Please change it to datagram: "echo datagram > /sys/class/net/mlx4_ib0.8010/mode" before loading your application with VMA library
      [0m[2;35m VMA WARNING: * VMA doesn't support IPoIB in connected mode.
      [0m[2;35m VMA WARNING: * Please refer to VMA Release Notes for more information
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0m[2;35m VMA WARNING: * IPoIB mode of interface 'mlx4_ib0.8012' is "connected" !
      [0m[2;35m VMA WARNING: * Please change it to datagram: "echo datagram > /sys/class/net/mlx4_ib0.8012/mode" before loading your application with VMA library
      [0m[2;35m VMA WARNING: * VMA doesn't support IPoIB in connected mode.
      [0m[2;35m VMA WARNING: * Please refer to VMA Release Notes for more information
      [0m[2;35m VMA WARNING: *******************************************************************************************************
      [0msockperf: [2;35m== version #3.10-0.git5ebd327da983 == [0m
      sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

      [ 0] IP = 172.31.2.101 PORT = 11111 # UDP
      sockperf: Warmup stage (sending a few dummy messages)...
      sockperf: Starting test...
      sockperf: Test end (interrupted by timer)
      sockperf: Test ended
      sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=1603591; ReceivedMessages=1603590
      sockperf: ========= Printing statistics for Server No: 0
      sockperf: [Valid Duration] RunTime=9.550 sec; SentMessages=1543650; ReceivedMessages=1543650
      sockperf: [2;35m====> avg-latency=3.066 (std-dev=0.948, mean-ad=0.203, median-ad=0.082, siqr=0.065, cv=0.309, std-error=0.001, 99.0% ci=[3.064, 3.068])[0m
      sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
      sockperf: Summary: Latency is 3.066 usec
      sockperf: [2;35mTotal 1543650 observations[0m; each percentile contains 15436.50 observations
      sockperf: ---> <MAX> observation = 107.296
      sockperf: ---> percentile 99.999 = 95.022
      sockperf: ---> percentile 99.990 = 10.879
      sockperf: ---> percentile 99.900 = 9.260
      sockperf: ---> percentile 99.000 = 7.216
      sockperf: ---> percentile 90.000 = 3.175
      sockperf: ---> percentile 75.000 = 3.039
      sockperf: ---> percentile 50.000 = 2.951
      sockperf: ---> percentile 25.000 = 2.909
      sockperf: ---> <MIN> observation = 2.716
      + [22-12-01 23:49:59] result=0
      + [22-12-01 23:49:59] '[' 0 -ne 0 ']'
      + [22-12-01 23:49:59] grep -qi -e ' error ' -e 'no messages were received' /tmp/vma.txt
      + [22-12-01 23:49:59] return 0

      Additional info:

              kheib Kamal Heib
              bchae Brian Chae (Inactive)
              Kamal Heib Kamal Heib
              infiniband-qe infiniband-qe infiniband-qe infiniband-qe
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: