Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-675

Kernel panic when running the bbdev test on ACC100 card on dpdk-22.11-3.el9_2 and dpdk23.11

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None

      Description of problem:
      Kernel panic when running the bbdev test on ACC100 card

      Version-Release number of selected component (if applicable):
      dpdk-22.11-3.el9_2.x86_64.rpm
      pf-bb-config-22.11-3.el9.x86_64.rpm

      How reproducible:

      Steps to Reproduce:
      Run the bbdev test as following
      [root@dell-per740-61 ~]# modprobe vfio-pci
      [root@dell-per740-61 ~]# modprobe pci_pf_stub
      [root@dell-per740-61 ~]# modprobe vfio-pci enable_sriov=1
      [root@dell-per740-61 ~]# lspci|grep accelerators
      af:00.0 Processing accelerators: Intel Corporation Device 0d5c
      [root@dell-per740-61 ~]# lspci -Dd 8086:0d5c | cut -d ' ' -f 1
      0000:af:00.0
      [root@dell-per740-61 ~]# dpdk-devbind.py -b vfio-pci 0000:af:00.0

      Actual results:
      Kernel will panic after run "dpdk-devbind.py -b vfio-pci 0000:af:00.0".It not always reproduce on this system. But it occur about 4 times.
      call trace:
      [ 106.723198] Call Trace:
      [ 106.723200] <NMI>
      [ 106.723202] dump_stack_lvl+0x34/0x48
      [ 106.723210] panic+0xea/0x2e4
      [ 106.723217] __ghes_panic.cold+0x21/0x21
      [ 106.723222] ghes_in_nmi_queue_one_entry.constprop.0+0x1d9/0x2a0
      [ 106.723227] ghes_notify_nmi+0x59/0xd0
      [ 106.723229] nmi_handle+0x5b/0x120
      [ 106.723236] default_do_nmi+0x40/0x130
      [ 106.723240] exc_nmi+0x111/0x140
      [ 106.723242] end_repeat_nmi+0x16/0x67
      [ 106.723249] RIP: 0010:intel_idle+0x55/0xa0
      [ 106.723254] Code: 48 89 d1 65 48 8b 04 25 c0 11 03 00 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d c2 13 4c 00 b9 01 00 00 00 48 89 f0 0f 01 c9 <65> 48 8b 04 25 c0 11 03 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
      [ 106.723256] RSP: 0018:ffffffffb8403e50 EFLAGS: 00000046
      [ 106.723259] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
      [ 106.723260] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffe1427fa00ca8
      [ 106.723261] RBP: ffffe1427fa00ca8 R08: 0000000000000002 R09: 0000000000000008
      [ 106.723262] R10: 00000000000003da R11: 00000000000003d8 R12: ffffffffb88b8d40
      [ 106.723263] R13: ffffffffb88b8e28 R14: 0000000000000002 R15: 0000000000000000
      [ 106.723265] ? intel_idle+0x55/0xa0
      [ 106.723268] ? intel_idle+0x55/0xa0
      [ 106.723270] </NMI>
      [ 106.723271] <TASK>
      [ 106.723271] cpuidle_enter_state+0x81/0x42a
      [ 106.723274] cpuidle_enter+0x29/0x40
      [ 106.723279] cpuidle_idle_call+0xfa/0x160
      [ 106.723284] do_idle+0x78/0xe0
      [ 106.723286] cpu_startup_entry+0x19/0x20
      [ 106.723288] rest_init+0xca/0xd0
      [ 106.723291] arch_call_rest_init+0xa/0x24
      [ 106.723298] start_kernel+0x4a3/0x4c2
      [ 106.723300] secondary_startup_64_no_verify+0xe5/0xeb
      [ 106.723307] </TASK>
      [ 0.000000] Linux version 5.14.0-348.el9.x86_64 (mockbuild@x86-vm-07.build.eng.bos.redhat.com) (gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), GNU ld version 2.35.2-42.el9) #1 SMP PREEMPT_DYNAMIC Mon Jul 31 18:52:45 EDT 2023

      Expected results:
      No kernel panic.

      Additional info:
      kernel panic job:
      https://beaker.engineering.redhat.com/jobs/8159304
      console log:
      https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/08/81593/8159304/14376283/console.log

      It occurred on following kernel:
      5.14.0-348.el9.x86_64
      5.14.0-284.17.1.el9.x86_64
      5.14.0-284.26.1.el9.x86_64

        1. kexec-dmesg.log
          157 kB
          Ting Li
        2. vmcore-dmesg.txt
          145 kB
          Ting Li

              mcoqueli@redhat.com Maxime Coquelin
              tli@redhat.com Ting Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: