Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6138

[RHEL8.8] most of IMB IO benchmarks fail when openmpi tests are run on iRDMA & QEDR ROCE devices

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • rhel-8.8.0
    • openmpi
    • Yes
    • None
    • 1
    • rhel-net-drivers
    • ssg_networking
    • 1
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Network Drivers 6
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      The following IMB IO benchmarks fail when OPENMPI test is run on iRDMA(E810) ROCE device.

      FAIL | 1 | NFS mount cannot be set: IMB-IO benchmarks may fail <<<=====
      PASS | 0 | openmpi IMB-MPI1 PingPong mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 PingPing mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Sendrecv mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Exchange mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Bcast mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Allgather mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Allgatherv mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Gather mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Gatherv mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Scatter mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Scatterv mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Alltoall mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Alltoallv mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Reduce mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Reduce_scatter mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Allreduce mpirun one_core
      PASS | 0 | openmpi IMB-MPI1 Barrier mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Write_indv mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Read_indv mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Write_expl mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Read_expl mpirun one_core
      FAIL | 1 | openmpi IMB-IO P_Write_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Read_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Write_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Read_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Write_shared mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Read_shared mpirun one_core <<<===========
      PASS | 0 | openmpi IMB-IO P_Write_priv mpirun one_core
      PASS | 0 | openmpi IMB-IO P_Read_priv mpirun one_core
      FAIL | 1 | openmpi IMB-IO C_Write_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Read_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Write_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Read_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Write_shared mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Read_shared mpirun one_core <<<===========
      PASS | 0 | openmpi IMB-EXT Window mpirun one_core
      PASS | 0 | openmpi IMB-EXT Unidir_Put mpirun one_core

      This is a regression from RHEL-8.7.0, where all benchmarks in OPENMPI passed.

      Version-Release number of selected component (if applicable):

      Clients: rdma-dev-31
      Servers: rdma-dev-30

      DISTRO=RHEL-8.8.0-20221120.2

      + [22-11-28 01:20:45] cat /etc/redhat-release
      Red Hat Enterprise Linux release 8.8 Beta (Ootpa)

      + [22-11-28 01:20:45] uname -a
      Linux rdma-dev-31.rdma.lab.eng.rdu2.redhat.com 4.18.0-438.el8.x86_64 #1 SMP Mon Nov 14 13:08:07 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

      + [22-11-28 01:20:45] cat /proc/cmdline
      BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-438.el8.x86_64 root=UUID=91133fda-1ba7-44d0-89eb-c8ed0099906d ro crashkernel=auto resume=UUID=b48ecc4f-d16e-4635-95eb-2d8235a4c80b console=ttyS0,115200n81

      + [22-11-28 01:20:45] rpm -q rdma-core linux-firmware
      rdma-core-41.0-1.el8.x86_64
      linux-firmware-20220726-110.git150864a4.el8.noarch

      + [22-11-28 01:20:45] tail /sys/class/infiniband/irdma0/fw_ver /sys/class/infiniband/irdma1/fw_ver
      ==> /sys/class/infiniband/irdma0/fw_ver <==
      1.52

      ==> /sys/class/infiniband/irdma1/fw_ver <==
      1.52

      + [22-11-28 01:20:45] lspci
      + [22-11-28 01:20:45] grep -i -e ethernet -e infiniband -e omni -e ConnectX
      04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      04:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      04:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      44:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
      44:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)

      Installed:
      mpitests-openmpi-5.8-1.el8.x86_64 openmpi-1:4.1.1-3.el8.x86_64
      openmpi-devel-1:4.1.1-3.el8.x86_64

      -------------------

      Please, refer to the openmpi tests from the following beaker ID for details

      https://beaker.engineering.redhat.com/jobs/7265214 [ RS:10915234 ]
      https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/11/72652/7265214/12983806/153040643/714627738/resultoutputfile.log

      How reproducible:
      100%

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:

      all openmpi tests to pass on all RDMA HCAs

      Additional info:

      For the RHEL8.7.0 OPENMPI test result, refer to following beaker run:

      https://beaker.engineering.redhat.com/jobs/6965989
      https://beaker.engineering.redhat.com/recipes/12533439#tasks

      where ALL benchmarks PASSED

              kheib Kamal Heib
              bchae Brian Chae (Inactive)
              Kamal Heib Kamal Heib
              infiniband-qe infiniband-qe infiniband-qe infiniband-qe
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: