-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
-
Description of problem:
Kernel panic when running the bbdev test on ACC100 card
Version-Release number of selected component (if applicable):
dpdk-22.11-3.el9_2.x86_64.rpm
pf-bb-config-22.11-3.el9.x86_64.rpm
How reproducible:
Steps to Reproduce:
Run the bbdev test as following
[root@dell-per740-61 ~]# modprobe vfio-pci
[root@dell-per740-61 ~]# modprobe pci_pf_stub
[root@dell-per740-61 ~]# modprobe vfio-pci enable_sriov=1
[root@dell-per740-61 ~]# lspci|grep accelerators
af:00.0 Processing accelerators: Intel Corporation Device 0d5c
[root@dell-per740-61 ~]# lspci -Dd 8086:0d5c | cut -d ' ' -f 1
0000:af:00.0
[root@dell-per740-61 ~]# dpdk-devbind.py -b vfio-pci 0000:af:00.0
Actual results:
Kernel will panic after run "dpdk-devbind.py -b vfio-pci 0000:af:00.0".It not always reproduce on this system. But it occur about 4 times.
call trace:
[ 106.723198] Call Trace:
[ 106.723200] <NMI>
[ 106.723202] dump_stack_lvl+0x34/0x48
[ 106.723210] panic+0xea/0x2e4
[ 106.723217] __ghes_panic.cold+0x21/0x21
[ 106.723222] ghes_in_nmi_queue_one_entry.constprop.0+0x1d9/0x2a0
[ 106.723227] ghes_notify_nmi+0x59/0xd0
[ 106.723229] nmi_handle+0x5b/0x120
[ 106.723236] default_do_nmi+0x40/0x130
[ 106.723240] exc_nmi+0x111/0x140
[ 106.723242] end_repeat_nmi+0x16/0x67
[ 106.723249] RIP: 0010:intel_idle+0x55/0xa0
[ 106.723254] Code: 48 89 d1 65 48 8b 04 25 c0 11 03 00 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d c2 13 4c 00 b9 01 00 00 00 48 89 f0 0f 01 c9 <65> 48 8b 04 25 c0 11 03 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
[ 106.723256] RSP: 0018:ffffffffb8403e50 EFLAGS: 00000046
[ 106.723259] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[ 106.723260] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffe1427fa00ca8
[ 106.723261] RBP: ffffe1427fa00ca8 R08: 0000000000000002 R09: 0000000000000008
[ 106.723262] R10: 00000000000003da R11: 00000000000003d8 R12: ffffffffb88b8d40
[ 106.723263] R13: ffffffffb88b8e28 R14: 0000000000000002 R15: 0000000000000000
[ 106.723265] ? intel_idle+0x55/0xa0
[ 106.723268] ? intel_idle+0x55/0xa0
[ 106.723270] </NMI>
[ 106.723271] <TASK>
[ 106.723271] cpuidle_enter_state+0x81/0x42a
[ 106.723274] cpuidle_enter+0x29/0x40
[ 106.723279] cpuidle_idle_call+0xfa/0x160
[ 106.723284] do_idle+0x78/0xe0
[ 106.723286] cpu_startup_entry+0x19/0x20
[ 106.723288] rest_init+0xca/0xd0
[ 106.723291] arch_call_rest_init+0xa/0x24
[ 106.723298] start_kernel+0x4a3/0x4c2
[ 106.723300] secondary_startup_64_no_verify+0xe5/0xeb
[ 106.723307] </TASK>
[ 0.000000] Linux version 5.14.0-348.el9.x86_64 (mockbuild@x86-vm-07.build.eng.bos.redhat.com) (gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), GNU ld version 2.35.2-42.el9) #1 SMP PREEMPT_DYNAMIC Mon Jul 31 18:52:45 EDT 2023
Expected results:
No kernel panic.
Additional info:
kernel panic job:
https://beaker.engineering.redhat.com/jobs/8159304
console log:
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/08/81593/8159304/14376283/console.log
It occurred on following kernel:
5.14.0-348.el9.x86_64
5.14.0-284.17.1.el9.x86_64
5.14.0-284.26.1.el9.x86_64