-
Bug
-
Resolution: Done
-
Undefined
-
None
-
rhel-9.1.0
-
None
-
None
-
sst_network_drivers
-
ssg_networking
-
None
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
Description of problem:
seqfault is observed in various points when fabtests are tested on QEDR ROCE device.
Version-Release number of selected component (if applicable):
Clients: rdma-dev-02
Servers: rdma-dev-03
DISTRO=RHEL-9.1.0-20220524.0
+ [22-06-06 17:19:15] cat /etc/redhat-release
Red Hat Enterprise Linux release 9.1 Beta (Plow)
+ [22-06-06 17:19:15] uname -a
Linux rdma-dev-03.rdma.lab.eng.rdu2.redhat.com 5.14.0-96.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 19 07:21:30 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
+ [22-06-06 17:19:15] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-96.el9.x86_64 root=UUID=64d8440c-2085-4fae-8e88-a47815c73ee6 ro console=tty0 rd_NO_PLYMOUTH intel_iommu=on iommu=on crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=26802d64-92b5-4143-b010-edb2a3cde899 console=ttyS1,115200
+ [22-06-06 17:19:15] rpm -q rdma-core linux-firmware
rdma-core-37.2-1.el9.x86_64
linux-firmware-20220509-126.el9.noarch
+ [22-06-06 17:19:15] tail /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver
==> /sys/class/infiniband/qedr0/fw_ver <==
8.59.1.0
==> /sys/class/infiniband/qedr1/fw_ver <==
8.59.1.0
+ [22-06-06 17:19:15] lspci
+ [22-06-06 17:19:15] grep -i -e ethernet -e infiniband -e omni -e ConnectX
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
+ [22-06-06 17:19:15] grep psm
+ [22-06-06 17:19:15] rpm -qa
psmisc-23.4-3.el9.x86_64
libpsm2-11.2.206-1.el9.x86_64
+ [22-06-06 17:19:16] rpm -qa
+ [22-06-06 17:19:16] grep libibverbs
libibverbs-37.2-1.el9.x86_64
libibverbs-utils-37.2-1.el9.x86_64
+ [22-06-06 17:19:16] rpm -q fabtests
fabtests-1.14.0-1.el9.x86_64
How reproducible:
100%
Steps to Reproduce:
1. run the fatests on QEDE IW with the above packages
2.
3.
Actual results:
Running python As root:
TIME PID UID GID SIG COREFILE EXE SIZE
Mon 2022-06-06 17:20:01 EDT 49986 0 0 SIGSEGV present /usr/bin/fi_poll 130.9K
Mon 2022-06-06 17:20:03 EDT 50034 0 0 SIGSEGV present /usr/bin/fi_poll 130.6K
Mon 2022-06-06 19:20:10 EDT 89072 0 0 SIGSEGV present /usr/bin/fi_poll 130.4K
Mon 2022-06-06 19:20:11 EDT 89111 0 0 SIGSEGV present /usr/bin/fi_poll 130.5K
total 544
rw-r----. 1 root root 134097 Jun 6 17:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.49986.1654550401000000.zst
rw-r----. 1 root root 133801 Jun 6 17:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.50034.1654550403000000.zst
rw-r----. 1 root root 133606 Jun 6 19:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.89072.1654557610000000.zst
rw-r----. 1 root root 133653 Jun 6 19:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.89111.1654557611000000.zst
Red Hat Enterprise Linux release 9.1 Beta (Plow)
Jun 06 17:20:01 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[49986]: segfault at 18 ip 0000561ce4dae050 sp 00007ffc53787f20 error 4 in fi_poll[561ce4dad000+5000]
Jun 06 17:20:02 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[50034]: segfault at 18 ip 000055e4aef64050 sp 00007ffebc1daa30 error 4 in fi_poll[55e4aef63000+5000]
Jun 06 19:20:10 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[89072]: segfault at 18 ip 0000562afc441050 sp 00007ffd940844d0 error 4 in fi_poll[562afc440000+5000]
Jun 06 19:20:11 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[89111]: segfault at 18 ip 00005558c70e1050 sp 00007ffedcdeab00 error 4 in fi_poll[5558c70e0000+5000]
Expected results:
Normal completion without any segfault
Additional info:
- duplicates
-
RHEL-6072 [RHEL9] fabtests result in many core files
- Planning
- external trackers