-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
rhel-8.8.0
-
None
-
None
-
1
-
rhel-net-drivers
-
ssg_networking
-
1
-
False
-
False
-
-
None
-
Network Drivers 6
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
-
57,005
Description of problem:
Some of the OPENMPI benchmarks time-out with RC1 when run on CXGB4 devices.
The failed benchmarks are as the following:
FAIL | 1 | openmpi IMB-IO P_Write_indv mpirun one_core
FAIL | 1 | openmpi IMB-IO P_Write_expl mpirun one_core
FAIL | 1 | openmpi IMB-IO P_Write_shared mpirun one_core
FAIL | 1 | openmpi IMB-IO P_Write_priv mpirun one_core
FAIL | 1 | openmpi IMB-IO C_Write_indv mpirun one_core
FAIL | 1 | openmpi IMB-IO C_Write_expl mpirun one_core
FAIL | 1 | openmpi IMB-IO C_Write_shared mpirun one_core
FAIL | 1 | openmpi OSU get_acc_latency mpirun one_core
FAIL | 1 | openmpi OSU mbw_mr mpirun one_core
Version-Release number of selected component (if applicable):
Clients: rdma-perf-06
Servers: rdma-dev-13
DISTRO=RHEL-8.8.0-20221120.2
+ [22-11-22 17:26:26] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.8 Beta (Ootpa)
+ [22-11-22 17:26:26] uname -a
Linux rdma-perf-06.rdma.lab.eng.rdu2.redhat.com 4.18.0-438.el8.x86_64 #1 SMP Mon Nov 14 13:08:07 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
+ [22-11-22 17:26:26] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-438.el8.x86_64 root=UUID=d2252d81-d5ef-4997-9348-d3eaf680ab28 ro crashkernel=auto resume=UUID=61814f59-03e7-45ad-95f8-7fd0c6e33c4c console=ttyS0,115200n81
+ [22-11-22 17:26:26] rpm -q rdma-core linux-firmware
rdma-core-41.0-1.el8.x86_64
linux-firmware-20220726-110.git150864a4.el8.noarch
+ [22-11-22 17:26:26] tail /sys/class/infiniband/cxgb4_0/fw_ver /sys/class/infiniband/hfi1_0/fw_ver /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver
==> /sys/class/infiniband/cxgb4_0/fw_ver <==
1.27.0.0
==> /sys/class/infiniband/hfi1_0/fw_ver <==
1.27.0
==> /sys/class/infiniband/mlx5_0/fw_ver <==
20.99.5392
==> /sys/class/infiniband/mlx5_1/fw_ver <==
20.99.5392
==> /sys/class/infiniband/qedr0/fw_ver <==
8. 59. 1. 0
==> /sys/class/infiniband/qedr1/fw_ver <==
8. 59. 1. 0
+ [22-11-22 17:26:26] lspci
+ [22-11-22 17:26:26] grep -i -e ethernet -e infiniband -e omni -e ConnectX
19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
5e:00.0 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
5e:00.1 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
5e:00.2 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
5e:00.3 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
5e:00.4 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
af:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
af:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
d8:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11)
Installed:
mpitests-openmpi-5.8-1.el8.x86_64 openmpi-1:4.1.1-3.el8.x86_64
openmpi-devel-1:4.1.1-3.el8.x86_64
How reproducible:
50% of the time when run on CXGB4 devices
Steps to Reproduce:
1. With the above build, run openmpi on the above specified CXGB4 devices
2. On the client, run on of the above failed benchmarks
3. + [22-11-22 18:14:13] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include cxgb4_0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' --mca mtl_base_verbose 100 --mca btl_base_verbose 100 -mca pml ucx -mca osc ucx -x UCX_NET_DEVICES=cxgb4_iw --mca osc_ucx_verbose 100 --mca pml_ucx_verbose 100 mpitests-IMB-IO P_Write_indv -time 1.5
Actual results:
#----------------------------------------------------------------
- Intel(R) MPI Benchmarks 2021.3, MPI-IO partn#----------------------------------------------------------------
- Date : Tue Nov 22 18:16:25 2022
- Machine : x86_64
- System : Linux
- Release : 4.18.0-438.el8.x86_64
- Version : #1 SMP Mon Nov 14 13:08:07 EST 2022
- MPI Version : 3.1
- MPI Thread Environment:
- Calling sequence was:
- mpitests-IMB-IO P_Write_indv -time 1.5
- Minimum io portion in bytes: 0
- Maximum io portion in bytes: 4194304
#
#
#
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:182 Got proc 0 address, size 38
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:411 connecting to proc. 0
- List of Benchmarks to run:
- P_Write_Indv
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:182 Got proc 1 address, size 38
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:411 connecting to proc. 1
#-----------------------------------------------------------------------------
- Benchmarking P_Write_Indv
- #processes = 1
- ( 1 additional process waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
# - MODE: AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 0.08 0.08 0.08 0.00
1 1000 319.52 319.52 319.52 0.00
2 1000 325.61 325.61 325.61 0.01
4 1000 319.90 319.90 319.90 0.01
8 1000 329.11 329.11 329.11 0.02
16 1000 326.20 326.20 326.20 0.05
32 1000 322.54 322.54 322.54 0.10
64 1000 325.89 325.89 325.89 0.20
128 1000 327.25 327.25 327.25 0.39
256 1000 330.26 330.26 330.26 0.78
512 1000 331.62 331.62 331.62 1.54
1024 1000 330.18 330.18 330.18 3.10
2048 1000 327.04 327.04 327.04 6.26
4096 1000 245.94 245.94 245.94 16.65
8192 1000 249.83 249.83 249.83 32.79
16384 1000 268.35 268.35 268.35 61.05
32768 1000 262.39 262.39 262.39 124.88
65536 640 270.77 270.77 270.77 242.04
131072 320 350.36 350.36 350.36 374.11
262144 160 521.02 521.02 521.02 503.14
524288 80 941.76 941.76 941.76 556.71
1048576 40 1785.27 1785.27 1785.27 587.35
2097152 20 2641.38 2641.38 2641.38 793.96
4194304 10 4394.20 4394.20 4394.20 954.51
#-----------------------------------------------------------------------------
- Benchmarking P_Write_Indv
- #processes = 1
- ( 1 additional process waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
# - MODE: NON-AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 100 1.67 1.67 1.67 0.00
1 100 271.89 271.89 271.89 0.00
2 100 312.69 312.69 312.69 0.01
4 100 321.63 321.63 321.63 0.01
8 100 327.70 327.70 327.70 0.02
16 100 326.18 326.18 326.18 0.05
32 100 328.91 328.91 328.91 0.10
64 100 317.61 317.61 317.61 0.20
128 100 322.36 322.36 322.36 0.40
256 100 321.40 321.40 321.40 0.80
512 100 329.73 329.73 329.73 1.55
1024 100 337.95 337.95 337.95 3.03
2048 100 323.15 323.15 323.15 6.34
4096 100 249.63 249.63 249.63 16.41
8192 100 242.56 242.56 242.56 33.77
16384 100 267.12 267.12 267.12 61.34
32768 100 271.87 271.87 271.87 120.53
65536 100 294.87 294.87 294.87 222.25
131072 100 366.33 366.33 366.33 357.80
262144 100 533.30 533.30 533.30 491.55
524288 80 960.17 960.17 960.17 546.04
1048576 40 1778.07 1778.07 1778.07 589.73
2097152 20 2809.93 2809.93 2809.93 746.34
4194304 10 4534.08 4534.08 4534.08 925.06
#-----------------------------------------------------------------------------
- Benchmarking P_Write_Indv
- #processes = 2
#-----------------------------------------------------------------------------
# - MODE: AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 0.05 0.06 0.06 0.00
1 1000 333.12 333.16 333.14 0.00
2 1000 333.10 333.12 333.11 0.01
4 1000 333.13 333.16 333.14 0.01
8 1000 324.81 324.84 324.82 0.02
16 1000 324.82 324.85 324.83 0.05
32 1000 333.14 333.17 333.16 0.10
64 1000 333.14 333.17 333.16 0.19
128 1000 324.79 324.82 324.81 0.39
256 1000 333.11 333.15 333.13 0.77
512 1000 341.46 341.50 341.48 1.50
1024 1000 341.46 341.50 341.48 3.00
2048 1000 358.14 358.17 358.16 5.72
4096 1000 358.09 358.13 358.11 11.44
8192 1000 333.13 333.16 333.14 24.59
16384 1000 399.79 399.81 399.80 40.98
32768 1000 533.04 533.08 533.06 61.47
65536 640 780.85 780.91 780.88 83.92
131072 320 1353.36 1353.47 1353.41 96.84
262144 160 2550.44 2550.70 2550.57 102.77
524288 80 4996.77 4997.25 4997.01 104.92
1048576 40 9784.93 9786.05 9785.49 107.15
2097152 20 23321.54 23323.96 23322.75 89.91
4194304 10 39969.60 39973.08 39971.34 104.93
#-----------------------------------------------------------------------------
- Benchmarking P_Write_Indv
- #processes = 2
#-----------------------------------------------------------------------------
# - MODE: NON-AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 100 43.02 43.06 43.04 0.00
1 61 25542.98 25543.60 25543.29 0.00
2 61 24313.98 24314.42 24314.20 0.00
4 61 25132.74 25133.35 25133.05 0.00
8 60 24996.23 24996.90 24996.56 0.00
16 60 26246.28 26246.76 26246.52 0.00
32 60 25135.36 25135.84 25135.60 0.00
64 60 24996.26 24996.75 24996.50 0.00
128 60 24996.56 24997.05 24996.81 0.01
256 60 24996.77 24997.26 24997.02 0.01
512 60 24996.62 24997.11 24996.87 0.02
1024 60 18195.19 18195.61 18195.40 0.06
+ [22-11-22 18:17:15] __MPI_check_result 1 mpitests-openmpi IMB-IO P_Write_indv mpirun /root/hfile_one_core
Expected results:
Finish the benchmark testing with complete stats
Additional info:
Here is an debug output from the benchmark command.
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: registering framework btl components
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: found loaded component ofi
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: component ofi register function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: found loaded component self
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: component self register function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: found loaded component sm
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: found loaded component tcp
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: component tcp register function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: found loaded component usnic
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: component usnic register function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: found loaded component vader
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_register: component vader register function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: opening btl components
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: found loaded component ofi
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: component ofi open function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: found loaded component self
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: component self open function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: found loaded component tcp
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: component tcp open function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: found loaded component usnic
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: component usnic open function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: found loaded component vader
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: components_open: component vader open function successful
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: initializing btl component ofi
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: registering framework btl components
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: found loaded component ofi
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: component ofi register function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: found loaded component self
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: component self register function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: found loaded component sm
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: found loaded component tcp
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: component tcp register function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: found loaded component usnic
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: component usnic register function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: found loaded component vader
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_register: component vader register function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: opening btl components
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: found loaded component ofi
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: component ofi open function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: found loaded component self
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: component self open function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: found loaded component tcp
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: component tcp open function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: found loaded component usnic
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: component usnic open function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: found loaded component vader
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: components_open: component vader open function successful
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: initializing btl component ofi
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: init of component ofi returned success
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: initializing btl component self
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: init of component self returned success
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: initializing btl component tcp
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl: tcp: Searching for exclude address+prefix: 127.0.0.1 / 8
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl: tcp: Found match: 127.0.0.1 (lo)
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: Attempting to bind to AF_INET port 1024
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: Successfully bound to AF_INET port 1024
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: my listening v4 socket is 0.0.0.0:1024
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: examining interface cxgb4_iw
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: using ipv6 interface cxgb4_iw
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: examining interface cxgb4_iw.51
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: using ipv6 interface cxgb4_iw.51
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: examining interface cxgb4_iw.52
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:tcp: using ipv6 interface cxgb4_iw.52
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: init of component tcp returned success
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: initializing btl component usnic
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] btl:usnic: disqualifiying myself due to fi_getinfo(3) failure: No data available (-61)
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: init of component usnic returned failure
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: close: component usnic closed
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: close: unloading component usnic
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: initializing btl component vader
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] select: init of component vader returned failure
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: close: component vader closed
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] mca: base: close: unloading component vader
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.13.0
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:289 mca_pml_ucx_init
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:114 Pack remote worker address, size 38
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:114 Pack local worker address, size 141
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:351 created ucp context 0x55b7e3c42d40, worker 0x14b9400f6010
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 95
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 95
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 22
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 95
[create_qp:2753]create qp: failed on ibv_cmd_create_qp with 95
rdma-perf-06.rdma.lab.eng.rdu2.redhat.com.81852Wrong pkey 0x8001, please use PSM2_PKEY to specify a valid pkey
(err=23)
rdma-perf-06.rdma.lab.eng.rdu2.redhat.com.81852Wrong pkey 0x8001, please use PSM2_PKEY to specify a valid pkey
(err=23)
rdma-perf-06.rdma.lab.eng.rdu2.redhat.com.81852Wrong pkey 0x8001, please use PSM2_PKEY to specify a valid pkey
(err=23)
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: init of component ofi returned success
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: initializing btl component self
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: init of component self returned success
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: initializing btl component tcp
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl: tcp: Searching for exclude address+prefix: 127.0.0.1 / 8
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl: tcp: Found match: 127.0.0.1 (lo)
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: Attempting to bind to AF_INET port 1024
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: Successfully bound to AF_INET port 1024
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: my listening v4 socket is 0.0.0.0:1024
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface qede_roce
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface qede_roce
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface cxgb4_iw
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface cxgb4_iw
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface idrac
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface idrac
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface hfi1_opa0
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface hfi1_opa0
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface cxgb4_iw.51
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface cxgb4_iw.51
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface cxgb4_iw.52
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface cxgb4_iw.52
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface qede_roce.45
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface qede_roce.45
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface qede_roce.43
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface qede_roce.43
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface mlx5_ib0
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface mlx5_ib0
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: examining interface mlx5_ib1
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:tcp: using ipv6 interface mlx5_ib1
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: init of component tcp returned success
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: initializing btl component usnic
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] btl:usnic: disqualifiying myself due to fi_getinfo(3) failure: No data available (-61)
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: init of component usnic returned failure
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: close: component usnic closed
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: close: unloading component usnic
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: initializing btl component vader
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] select: init of component vader returned failure
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: close: component vader closed
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] mca: base: close: unloading component vader
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.13.0
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:289 mca_pml_ucx_init
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:114 Pack remote worker address, size 38
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:114 Pack local worker address, size 141
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:351 created ucp context 0x555f2893b700, worker 0x555f28f907d0
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:182 Got proc 0 address, size 141
[rdma-perf-06.rdma.lab.eng.rdu2.redhat.com:81852] pml_ucx.c:411 connecting to proc. 0
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:182 Got proc 1 address, size 141
[rdma-dev-13.rdma.lab.eng.rdu2.redhat.com:75607] pml_ucx.c:411 connecting to proc. 1
- external trackers