Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6192

[RHEL9.1] fabtests on QEDR DEVICE result in core files

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • rhel-9.1.0
    • fabtests
    • None
    • None
    • sst_network_drivers
    • ssg_networking
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None

      Description of problem:

      seqfault is observed in various points when fabtests are tested on QEDR ROCE device.

      Version-Release number of selected component (if applicable):

      Clients: rdma-dev-02
      Servers: rdma-dev-03

      DISTRO=RHEL-9.1.0-20220524.0

      + [22-06-06 17:19:15] cat /etc/redhat-release
      Red Hat Enterprise Linux release 9.1 Beta (Plow)

      + [22-06-06 17:19:15] uname -a
      Linux rdma-dev-03.rdma.lab.eng.rdu2.redhat.com 5.14.0-96.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 19 07:21:30 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

      + [22-06-06 17:19:15] cat /proc/cmdline
      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-96.el9.x86_64 root=UUID=64d8440c-2085-4fae-8e88-a47815c73ee6 ro console=tty0 rd_NO_PLYMOUTH intel_iommu=on iommu=on crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=26802d64-92b5-4143-b010-edb2a3cde899 console=ttyS1,115200

      + [22-06-06 17:19:15] rpm -q rdma-core linux-firmware
      rdma-core-37.2-1.el9.x86_64
      linux-firmware-20220509-126.el9.noarch

      + [22-06-06 17:19:15] tail /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver
      ==> /sys/class/infiniband/qedr0/fw_ver <==
      8.59.1.0

      ==> /sys/class/infiniband/qedr1/fw_ver <==
      8.59.1.0

      + [22-06-06 17:19:15] lspci
      + [22-06-06 17:19:15] grep -i -e ethernet -e infiniband -e omni -e ConnectX
      02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
      02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
      08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
      08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)

      + [22-06-06 17:19:15] grep psm
      + [22-06-06 17:19:15] rpm -qa
      psmisc-23.4-3.el9.x86_64
      libpsm2-11.2.206-1.el9.x86_64

      + [22-06-06 17:19:16] rpm -qa
      + [22-06-06 17:19:16] grep libibverbs
      libibverbs-37.2-1.el9.x86_64
      libibverbs-utils-37.2-1.el9.x86_64

      + [22-06-06 17:19:16] rpm -q fabtests
      fabtests-1.14.0-1.el9.x86_64
      How reproducible:

      100%

      Steps to Reproduce:

      1. run the fatests on QEDE IW with the above packages
      2.
      3.

      Actual results:

      Running python As root:
      TIME PID UID GID SIG COREFILE EXE SIZE
      Mon 2022-06-06 17:20:01 EDT 49986 0 0 SIGSEGV present /usr/bin/fi_poll 130.9K
      Mon 2022-06-06 17:20:03 EDT 50034 0 0 SIGSEGV present /usr/bin/fi_poll 130.6K
      Mon 2022-06-06 19:20:10 EDT 89072 0 0 SIGSEGV present /usr/bin/fi_poll 130.4K
      Mon 2022-06-06 19:20:11 EDT 89111 0 0 SIGSEGV present /usr/bin/fi_poll 130.5K
      total 544
      rw-r----. 1 root root 134097 Jun 6 17:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.49986.1654550401000000.zst
      rw-r----. 1 root root 133801 Jun 6 17:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.50034.1654550403000000.zst
      rw-r----. 1 root root 133606 Jun 6 19:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.89072.1654557610000000.zst
      rw-r----. 1 root root 133653 Jun 6 19:20 core.fi_poll.0.0d900a6b147544268a0d7cdaf8aa3b1e.89111.1654557611000000.zst
      Red Hat Enterprise Linux release 9.1 Beta (Plow)

      Jun 06 17:20:01 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[49986]: segfault at 18 ip 0000561ce4dae050 sp 00007ffc53787f20 error 4 in fi_poll[561ce4dad000+5000]

      Jun 06 17:20:02 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[50034]: segfault at 18 ip 000055e4aef64050 sp 00007ffebc1daa30 error 4 in fi_poll[55e4aef63000+5000]

      Jun 06 19:20:10 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[89072]: segfault at 18 ip 0000562afc441050 sp 00007ffd940844d0 error 4 in fi_poll[562afc440000+5000]

      Jun 06 19:20:11 rdma-dev-03.rdma.lab.eng.rdu2.redhat.com kernel: fi_poll[89111]: segfault at 18 ip 00005558c70e1050 sp 00007ffedcdeab00 error 4 in fi_poll[5558c70e0000+5000]

      Expected results:

      Normal completion without any segfault

      Additional info:

            mschmidt@redhat.com Michal Schmidt
            bchae Brian Chae
            Kamal Heib Kamal Heib
            infiniband-qe infiniband-qe infiniband-qe infiniband-qe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: