-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
rhel-9.1.0
-
None
-
None
-
rhel-net-drivers
-
ssg_networking
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
-
57,005
Description of problem:
The following OSU benchmarks fails due to "Error in init phase" on BCM57508 device, when "mpirun_rsh" is used for them.
FAIL | 1 | mvapich2 OSU acc_latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU allgather mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU allgatherv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU allreduce mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU alltoall mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU alltoallv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU barrier mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU bcast mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU bibw mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU bw mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU cas_latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU fop_latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU gather mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU gatherv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU get_acc_latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU get_bw mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU get_latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU hello mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU iallgather mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU iallgatherv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU iallreduce mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU ialltoall mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU ialltoallv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU ialltoallw mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU ibarrier mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU ibcast mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU igather mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU igatherv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU init mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU ireduce mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU iscatter mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU iscatterv mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU latency_mp mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU mbw_mr mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU multi_lat mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU put_bibw mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU put_bw mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU put_latency mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU reduce mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU reduce_scatter mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU scatter mpirun_rsh one_core
FAIL | 1 | mvapich2 OSU scatterv mpirun_rsh one_core
Version-Release number of selected component (if applicable):
Clients: rdma-dev-26
Servers: rdma-dev-25
DISTRO=RHEL-9.1.0-20220509.3
+ [22-05-10 09:57:56] cat /etc/redhat-release
Red Hat Enterprise Linux release 9.1 Beta (Plow)
+ [22-05-10 09:57:56] uname -a
Linux rdma-dev-26.rdma.lab.eng.rdu2.redhat.com 5.14.0-86.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 6 09:23:00 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
+ [22-05-10 09:57:56] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-86.el9.x86_64 root=/dev/mapper/rhel_rdma-dev26-root ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH crashkernel=1G-4G:192M,4G-64G:256M,64G:512M resume=/dev/mapper/rhel_rdma-dev-26-swap rd.lvm.lv=rhel_rdma-dev-26/root rd.lvm.lv=rhel_rdma-dev-26/swap console=ttyS1,115200n81
+ [22-05-10 09:57:56] rpm -q rdma-core linux-firmware
rdma-core-37.2-1.el9.x86_64
linux-firmware-20220209-126.el9_0.noarch
+ [22-05-10 09:57:56] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver
==> /sys/class/infiniband/bnxt_re0/fw_ver <==
219.0.112.0
==> /sys/class/infiniband/bnxt_re1/fw_ver <==
219.0.112.0
+ [22-05-10 09:57:56] lspci
+ [22-05-10 09:57:56] grep -i -e ethernet -e infiniband -e omni -e ConnectX
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
Installed:
mpitests-mvapich2-5.8-1.el9.x86_64 mvapich2-2.3.6-3.el9.x86_64
How reproducible:
100%
Steps to Reproduce:
1. bring up the RDMA hosts mentioned above with RHEL8.7 build
2. set up RDMA hosts for mvapich2 benchamrk tests
3. run one of the mvapich2 benchmark with "mpirun_rsh" command, as the following:
timeout --preserve-status --kill-after=5m 3m mpirun_rsh -hostfile /root/hfile_one_core -np 2 /usr/lib64/mvapich2/bin/mpitests-osu_allgatherv
Actual results:
[rdma-dev-26.rdma.lab.eng.rdu2.redhat.com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
+ [22-05-10 12:29:27] __MPI_check_result 1 mpitests-mvapich2 OSU /usr/lib64/mvapich2/bin/mpitests-osu_allgatherv mpirun_rsh /root/hfile_one_core
Expected results:
Normal execution of the benchmarks with stats output
Additional info:
- external trackers