-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
rhel-8.9.0
-
None
-
None
-
1
-
rhel-net-drivers
-
ssg_networking
-
1
-
False
-
False
-
-
None
-
Network Drivers 6
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
-
57,005
Description of problem:
"ib_write_lat RC" test fail consistently when perftest is tested on BCM57504 - it failed due to 3-minute timer, which means the data throughput performance was very low, compared with the same test over other bnxt RoCE devices, like BCM57414.
Version-Release number of selected component (if applicable):
Clients: rdma-qe-35
Servers: rdma-qe-34
DISTRO=RHEL-8.9.0-20230521.41
+ [23-05-27 15:56:21] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
+ [23-05-27 15:56:21] uname -a
Linux rdma-qe-35.rdma.lab.eng.rdu2.redhat.com 4.18.0-492.el8.x86_64 #1 SMP Tue May 9 14:50:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
+ [23-05-27 15:56:21] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-492.el8.x86_64 root=UUID=aeafc44d-6857-41fb-b528-f3a2697fa426 ro crashkernel=auto resume=UUID=e29086f9-4a66-4501-b566-8a60ef6e209a console=ttyS0,115200n81
+ [23-05-27 15:56:21] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el8.1.x86_64
linux-firmware-20230515-115.gitd1962891.el8.noarch
+ [23-05-27 15:56:21] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver /sys/class/infiniband/bnxt_re2/fw_ver /sys/class/infiniband/bnxt_re3/fw_ver
==> /sys/class/infiniband/bnxt_re0/fw_ver <==
216.4.16.0
==> /sys/class/infiniband/bnxt_re1/fw_ver <==
216.4.16.0
==> /sys/class/infiniband/bnxt_re2/fw_ver <==
216.4.16.0
==> /sys/class/infiniband/bnxt_re3/fw_ver <==
216.4.16.0
+ [23-05-27 15:56:21] lspci
+ [23-05-27 15:56:21] grep -i -e ethernet -e infiniband -e omni -e ConnectX
19:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
19:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
19:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
19:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
5e:00.2 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
5e:00.3 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
+ [23-05-27 15:56:21] rpm -q perftest
perftest-4.5.0.20-4.el8.x86_64
How reproducible:
100%
Steps to Reproduce:
1. On the server host, issue
timeout 3m ib_write_lat -a -c RC -d bnxt_re1 -i 1 -F -R
2. On the client host, issue
+ [23-05-27 15:58:05] timeout 3m ib_write_lat -a -c RC -d bnxt_re1 -i 1 -F -R 172.31.45.248
3.
Actual results:
/usr/bin/rhts_sync_block -s ib_write_lat-RC-ready_-roce.45-0 rdma-qe-34 – Blocking state(s) = 14_ib_write_lat-RC-ready_-roce.45-0
+ [23-05-27 15:58:04] sleep 1
+ [23-05-27 15:58:05] timeout 3m ib_write_lat -a -c RC -d bnxt_re1 -i 1 -F -R 172.31.45.248
+ [23-05-27 16:01:05] RQA_check_result -r 124 -t 'ib_write_lat RC'
Refer to beaker test job: https://beaker.engineering.redhat.com/jobs/7898077
T:160852234
+00:37:33
/kernel/infiniband/perftest
Expected results:
When tested on BCM57414,
+ [23-05-27 16:05:55] timeout 3m ib_write_lat -a -c RC -d bnxt_re3 -i 1 -F -R 172.31.45.24
---------------------------------------------------------------------------------------
RDMA_Write Latency Test
Dual-port : OFF Device : bnxt_re3
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: OFF
ibv_wr* API : OFF
TX depth : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 3
Max inline data : 96[B]
rdma_cm QPs : ON
Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x01d2 PSN 0x2a0fc6
GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:25
remote address: LID 0000 QPN 0x00d2 PSN 0x9723a5
GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:24
---------------------------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
2 1000 4.27 9.00 4.31 4.32 0.06 4.69 9.00
4 1000 4.21 8.83 4.24 4.25 0.03 4.60 8.83
8 1000 4.22 7.04 4.26 4.27 0.00 4.60 7.04
16 1000 4.13 4.77 4.27 4.28 0.00 4.46 4.77
32 1000 4.30 5.24 4.34 4.35 0.00 4.68 5.24
64 1000 4.35 5.17 4.38 4.40 0.00 4.72 5.17
128 1000 5.06 6.10 5.09 5.10 0.00 5.45 6.10
256 1000 5.18 6.00 5.21 5.22 0.00 5.56 6.00
512 1000 5.29 9.44 5.33 5.34 0.03 5.72 9.44
1024 1000 5.60 6.43 5.64 5.67 0.00 6.06 6.43
2048 1000 6.23 11.60 6.29 6.31 0.03 6.56 11.60
4096 1000 7.52 13.62 7.57 7.60 0.11 7.92 13.62
8192 1000 8.74 9.46 8.80 8.84 0.00 9.11 9.46
16384 1000 11.44 12.50 11.64 11.67 0.00 12.14 12.50
32768 1000 16.79 18.06 16.88 16.95 0.00 17.40 18.06
65536 1000 27.50 28.33 27.56 27.64 0.00 28.17 28.33
131072 1000 48.89 49.79 48.99 49.06 0.00 49.58 49.79
262144 1000 91.74 96.25 91.83 91.89 0.03 92.28 96.25
524288 1000 177.37 178.39 177.46 177.52 0.00 178.03 178.39
1048576 1000 348.66 350.79 348.77 348.84 0.04 349.33 350.79
2097152 1000 691.26 692.64 691.36 691.43 0.04 692.02 692.64
4194304 1000 1376.42 1378.20 1376.51 1376.60 0.11 1377.72 1378.20
8388608 1000 2746.77 2749.07 2746.87 2746.98 0.13 2748.31 2749.07
---------------------------------------------------------------------------------------
+ [23-05-27 16:06:16] RQA_check_result -r 0 -t 'ib_write_lat RC'
Refer to
https://beaker.engineering.redhat.com/jobs/7898076
task
/kernel/infiniband/perftest
Additional info:
- external trackers