Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6077

[RHEL-8.9] ucx_perftest in UCX produces core file in client host when tested on MLX5 IB devices

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • rhel-8.9.0
    • ucx
    • None
    • None
    • rhel-sst-network-drivers
    • ssg_networking
    • 1
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None

      Description of problem:

      After "ucx_perftest" tests, RDMA client was left with SIG 6 core file. This happened the UCX was tested on MLX5 IB devices.

      TIME PID UID GID SIG COREFILE EXE

      Sat 2023-05-27 14:05:06 EDT 331688 0 0 6 none /usr/bin/ucx_perftest
      Sat 2023-05-27 14:08:42 EDT 331781 0 0 6 none /usr/bin/ucx_perftest
      Sat 2023-05-27 14:11:08 EDT 331866 0 0 6 none /usr/bin/ucx_perftest
      Sat 2023-05-27 14:14:44 EDT 331960 0 0 6 none /usr/bin/ucx_perftest
      Sat 2023-05-27 14:17:10 EDT 332047 0 0 6 none /usr/bin/ucx_perftest

      Version-Release number of selected component (if applicable):

      Clients: rdma-perf-03
      Servers: rdma-perf-02

      DISTRO=RHEL-8.9.0-20230521.41

      + [23-05-27 14:03:46] cat /etc/redhat-release
      Red Hat Enterprise Linux release 8.9 Beta (Ootpa)

      + [23-05-27 14:03:46] uname -a
      Linux rdma-perf-03.rdma.lab.eng.rdu2.redhat.com 4.18.0-492.el8.x86_64 #1 SMP Tue May 9 14:50:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

      + [23-05-27 14:03:46] cat /proc/cmdline
      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-492.el8.x86_64 root=UUID=ba31a4b2-48e9-406e-82c0-86137253e7c5 ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 reboot=acpi crashkernel=auto resume=UUID=6342dea0-23ae-4c1d-8132-4d6e2037fa0e console=ttyS1,115200n81

      + [23-05-27 14:03:46] rpm -q rdma-core linux-firmware
      rdma-core-44.0-2.el8.1.x86_64
      linux-firmware-20230515-115.gitd1962891.el8.noarch

      + [23-05-27 14:03:46] tail /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver
      ==> /sys/class/infiniband/mlx5_0/fw_ver <==
      16.33.1048

      ==> /sys/class/infiniband/mlx5_1/fw_ver <==
      16.33.1048

      + [23-05-27 14:03:46] lspci
      + [23-05-27 14:03:46] grep -i -e ethernet -e infiniband -e omni -e ConnectX
      03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      03:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      03:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
      07:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
      07:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

      How reproducible:
      100%

      Steps to Reproduce:

      Please, refer to the following beaker test logs:

      https://beaker.engineering.redhat.com/jobs/7908288

      https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/79082/7908288/13996201/161005246/752813761/resultoutputfile.log

      1.
      2.
      3.

      Actual results:

      Expected results:

      No core files produced after UCX tests

      Additional info:

              network-drivers-bugs@redhat.com network-drivers-bugs group
              bchae Brian Chae
              RH Bugzilla Integration RH Bugzilla Integration
              infiniband-qe infiniband-qe infiniband-qe infiniband-qe
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: