-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
rhel-8.7.0
-
None
-
None
-
sst_network_drivers
-
None
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
Description of problem:
seqfault is observed in various points when fabtests are tested on QEDR ROCE device.
Version-Release number of selected component (if applicable):
Clients: rdma-dev-02
Servers: rdma-perf-06
DISTRO=RHEL-8.7.0-20220524.0
+ [22-06-06 21:42:37] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.7 Beta (Ootpa)
+ [22-06-06 21:42:37] uname -a
Linux rdma-dev-02.rdma.lab.eng.rdu2.redhat.com 4.18.0-393.el8.x86_64 #1 SMP Wed May 18 12:44:50 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
+ [22-06-06 21:42:37] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-393.el8.x86_64 root=UUID=f3b9d64f-1ad8-44bd-b339-3a4297ae3e9a ro console=tty0 rd_NO_PLYMOUTH intel_iommu=on iommu=on crashkernel=auto resume=UUID=b92a6a91-c13f-46c2-b3b6-e1d187ba4ac3 console=ttyS1,115200
+ [22-06-06 21:42:37] rpm -q rdma-core linux-firmware
rdma-core-37.2-1.el8.x86_64
linux-firmware-20220210-107.git6342082c.el8.noarch
+ [22-06-06 21:42:37] tail /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver
==> /sys/class/infiniband/qedr0/fw_ver <==
8. 59. 1. 0
==> /sys/class/infiniband/qedr1/fw_ver <==
8. 59. 1. 0
+ [22-06-06 21:42:37] lspci
+ [22-06-06 21:42:37] grep -i -e ethernet -e infiniband -e omni -e ConnectX
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
+ [22-06-06 21:53:10] grep psm
psmisc-23.1-5.el8.x86_64
libpsm2-11.2.206-1.el8.x86_64
+ [22-06-06 21:53:11] rpm -qa
+ [22-06-06 21:53:11] grep libibverbs
libibverbs-37.2-1.el8.x86_64
libibverbs-utils-37.2-1.el8.x86_64
+ [22-06-06 21:53:12] RQA_pkg_install fabtests
+ [22-06-06 21:53:12] PKG_LIST=
+ [22-06-06 21:53:12] for p in "$@"
+ [22-06-06 21:53:12] rpm -q fabtests
fabtests-1.14.0-1.el8.x86_64
How reproducible:
100%
Steps to Reproduce:
1. run the fatests on QEDE IW with the above packages
2.
3.
Actual results:
TIME PID UID GID SIG COREFILE EXE
Mon 2022-06-06 22:04:57 EDT 75235 0 0 11 present /usr/bin/fi_poll
Mon 2022-06-06 22:05:01 EDT 75280 0 0 11 present /usr/bin/fi_poll
Tue 2022-06-07 00:15:14 EDT 117329 0 0 11 present /usr/bin/fi_poll
Tue 2022-06-07 00:15:18 EDT 117387 0 0 11 present /usr/bin/fi_poll
total 936
rw-r----. 1 root root 233046 Jun 7 00:15 core.fi_poll.0.92cf828b70574a22b559e34b049b91c9.117329.1654575314000000.lz4
rw-r----. 1 root root 232933 Jun 7 00:15 core.fi_poll.0.92cf828b70574a22b559e34b049b91c9.117387.1654575317000000.lz4
rw-r----. 1 root root 233706 Jun 6 22:04 core.fi_poll.0.92cf828b70574a22b559e34b049b91c9.75235.1654567497000000.lz4
rw-r----. 1 root root 233787 Jun 6 22:05 core.fi_poll.0.92cf828b70574a22b559e34b049b91c9.75280.1654567501000000.lz4
Jun 06 22:04:57 rdma-dev-02.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[75235]: segfault at 18 ip 000055ad78c06fd1 sp 00007ffc716d0510 error 4 in fi_poll[55ad78c05000+11000]
Jun 06 22:05:01 rdma-dev-02.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[75280]: segfault at 18 ip 0000557945fd4fd1 sp 00007ffef3c7b910 error 4 in fi_poll[557945fd3000+11000]
Expected results:
Normal completion without any segfault
Additional info:
- external trackers