Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6141

[RHEL9.2] most or some of IMB-IO benchmarks fail on iRDMA and QEDR ROCE

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Yes
    • None
    • rhel-net-drivers
    • ssg_networking
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      When run on E810 iRDMA ROCE, the following IMB-IO benchmarks failed

      FAIL | 1 | NFS mount cannot be set: IMB-IO benchmarks may fail
      PASS | 0 | openmpi IMB-IO S_Write_indv mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Read_indv mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Write_expl mpirun one_core
      PASS | 0 | openmpi IMB-IO S_Read_expl mpirun one_core
      FAIL | 1 | openmpi IMB-IO P_Write_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Read_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Write_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Read_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO P_Write_shared mpirun one_core <<<=========
      FAIL | 1 | openmpi IMB-IO P_Read_shared mpirun one_core <<<===========
      PASS | 0 | openmpi IMB-IO P_Write_priv mpirun one_core
      PASS | 0 | openmpi IMB-IO P_Read_priv mpirun one_core
      PASS | 0 | openmpi IMB-IO C_Write_indv mpirun one_core
      FAIL | 1 | openmpi IMB-IO C_Read_indv mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Write_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Read_expl mpirun one_core <<<===========
      FAIL | 1 | openmpi IMB-IO C_Write_shared mpirun one_core<<<===========
      FAIL | 1 | openmpi IMB-IO C_Read_shared mpirun one_core<<<===========

      This is a regression when compared with RHEL9.1 Beta compose build test result on OPENMPI, where no such issues found.

      Version-Release number of selected component (if applicable):

      Clients: rdma-dev-31
      Servers: rdma-dev-30

      DISTRO=RHEL-9.2.0-20221122.2

      + [22-12-06 01:03:14] cat /etc/redhat-release
      Red Hat Enterprise Linux release 9.2 Beta (Plow)

      + [22-12-06 01:03:14] uname -a
      Linux rdma-dev-31.rdma.lab.eng.rdu2.redhat.com 5.14.0-197.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 16 14:31:27 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

      + [22-12-06 01:03:14] cat /proc/cmdline
      BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-197.el9.x86_64 root=UUID=d3cd314b-8715-4183-afb9-daafd8d9ad53 ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=2a841cae-212e-4b91-ad1d-097144a800dc console=ttyS0,115200n81

      + [22-12-06 01:03:14] rpm -q rdma-core linux-firmware
      rdma-core-41.0-3.el9.x86_64
      linux-firmware-20221012-128.el9.noarch

      + [22-12-06 01:03:14] tail /sys/class/infiniband/irdma0/fw_ver /sys/class/infiniband/irdma1/fw_ver
      ==> /sys/class/infiniband/irdma0/fw_ver <==
      1.52

      ==> /sys/class/infiniband/irdma1/fw_ver <==
      1.52
      + [22-12-06 01:03:14] lspci
      + [22-12-06 01:03:14] grep -i -e ethernet -e infiniband -e omni -e ConnectX
      04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      04:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      04:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      44:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
      44:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)

      Installed:
      mpitests-openmpi-5.8-1.el9.x86_64 openmpi-1:4.1.1-5.el9.x86_64
      openmpi-devel-1:4.1.1-5.el9.x86_64

      How reproducible:

      Steps to Reproduce:
      1. Please, refer to the following beaker job URL

      https://beaker.engineering.redhat.com/jobs/7269089

      Take a look at the "RS:10921438" for openmpi testing

      2.
      3.

      Actual results:

      Expected results:

      Additional info:

              kheib Kamal Heib
              bchae Brian Chae (Inactive)
              Kamal Heib Kamal Heib
              infiniband-qe infiniband-qe infiniband-qe infiniband-qe
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: