Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-327

ice driver: kernel panic when running ovs dpdk pvp performance case of ovs3.2

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • rhel-net-ovs-dpdk
    • ssg_networking

      Description of problem:
      ice driver: there is kernel call trace when running ovs dpdk pvp performance case of ovs3.2

      Version-Release number of selected component (if applicable):
      [root@dell-per740-57 crash]# rpm -qa|grep dpdk
      dpdk-22.11-1.el9.x86_64
      dpdk-tools-22.11-1.el9.x86_64
      [root@dell-per740-57 crash]# rpm -qa|grep driverctl
      driverctl-0.111-2.el9.noarch
      [root@dell-per740-57 crash]# rpm -qa|grep openvswitch
      kernel-kernel-networking-openvswitch-common-4.0-4.noarch
      kernel-kernel-networking-openvswitch-perf-1.0-438.noarch
      openvswitch-selinux-extra-policy-1.0-34.el9fdp.noarch
      openvswitch3.2-3.2.0-0.2.el9fdp.x86_64
      [root@dell-per740-57 crash]# uname -r
      5.14.0-284.27.1.el9_2.x86_64

      How reproducible:

      Steps to Reproduce:
      Run ovs dpdk vhostuser pvp performance case.

      Actual results:
      There are many kernel call trace and the job is hung.
      Access to the system and check the current process. and found the system is running commands "driverctl set-override 0000:3b:00.0 vfio-pci"
      [root@dell-per740-57 ~]# ps aux|grep driverctl
      root 1601035 0.0 0.0 7312 3896 ? D 12:05 0:00 /usr/bin/bash /usr/sbin/driverctl set-override 0000:3b:00.0 vfio-pci
      root 1602051 0.0 0.0 7252 436 pts/0 D 20:58 0:00 /usr/bin/bash /usr/sbin/driverctl -v list-overrides
      root 1602509 0.0 0.0 6408 2324 pts/2 S+ 21:40 0:00 grep --color=auto driverctl

      call trace log:
      [ 6703.382640] NMI backtrace for cpu 7
      [ 6703.386132] CPU: 7 PID: 11 Comm: kworker/u96:0 Kdump: loaded Tainted: G I -------- — 5.14.0-284.27.1.el9_2.x86_64 #1
      [ 6703.398203] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.12.2 07/09/2021
      [ 6703.405771] Workqueue: netns cleanup_net
      [ 6703.409696] Call Trace:
      [ 6703.412150] <IRQ>
      [ 6703.414169] dump_stack_lvl+0x34/0x48
      [ 6703.417834] nmi_cpu_backtrace.cold+0x30/0x6f
      [ 6703.422192] ? lapic_can_unplug_cpu+0x80/0x80
      [ 6703.426553] nmi_trigger_cpumask_backtrace+0xef/0x110
      [ 6703.431608] trigger_single_cpu_backtrace+0x2a/0x31
      [ 6703.436487] rcu_dump_cpu_stacks+0xa7/0xe4
      [ 6703.440585] print_cpu_stall.cold+0x4f/0x17d
      [ 6703.444857] check_cpu_stall+0xe9/0x240
      [ 6703.448699] rcu_pending+0x26/0x190
      [ 6703.452191] rcu_sched_clock_irq+0x3d/0x180
      [ 6703.456376] update_process_times+0x8c/0xc0
      [ 6703.460562] tick_sched_handle+0x22/0x60
      [ 6703.464489] tick_sched_timer+0x65/0x80
      [ 6703.468329] ? tick_sched_do_timer+0xa0/0xa0
      [ 6703.472601] __hrtimer_run_queues+0x127/0x2c0
      [ 6703.476959] hrtimer_interrupt+0xfc/0x210
      [ 6703.480975] __sysvec_apic_timer_interrupt+0x5c/0x110
      [ 6703.486025] sysvec_apic_timer_interrupt+0x6d/0x90
      [ 6703.490817] </IRQ>
      [ 6703.492922] <TASK>
      [ 6703.495022] asm_sysvec_apic_timer_interrupt+0x16/0x20
      [ 6703.500162] RIP: 0010:xas_start+0x1b/0xd0
      [ 6703.504173] Code: 4c 89 4f 08 c3 cc cc cc cc 4c 89 c0 eb a5 90 48 8b 57 18 48 89 d0 83 e0 03 74 6b 48 83 f8 02 75 09 48 81 fa 05 c0 ff ff 77 31 <48> 8b 07 48 8b 57 08 48 8b 40 08 48 89 c1 83 e1 03 48 83 f9 02 75
      [ 6703.522919] RSP: 0018:ffffb4f0003c7d80 EFLAGS: 00000202
      [ 6703.528146] RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffffffffac704180
      [ 6703.535279] RDX: 0000000000000003 RSI: 0000000000000004 RDI: ffffb4f0003c7d88
      [ 6703.542411] RBP: ffff8daa07ab0000 R08: 0000000000000001 R09: 0000000000000230
      [ 6703.549543] R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffffffffaadbf6d0
      [ 6703.556679] R13: ffff8db1c5cac100 R14: ffffffffade17da0 R15: ffffffffade17d54
      [ 6703.563811] ? xas_find+0x1c0/0x1c0
      [ 6703.567306] xa_get_mark+0x59/0x100
      [ 6703.570796] devlinks_xa_find_get.constprop.0+0x51/0x90
      [ 6703.576023] devlink_pernet_pre_exit+0x3c/0xf0
      [ 6703.580467] ? mutex_lock+0xe/0x30
      [ 6703.583874] cleanup_net+0x1d2/0x380
      [ 6703.587454] process_one_work+0x1e2/0x3b0
      [ 6703.591468] worker_thread+0x50/0x3a0
      [ 6703.595132] ? rescuer_thread+0x390/0x390
      [ 6703.599144] kthread+0xd6/0x100
      [ 6703.602291] ? kthread_complete_and_exit+0x20/0x20
      [ 6703.607083] ret_from_fork+0x1f/0x30
      [ 6703.610666] </TASK>
      [ 6746.003796] restraintd[1740]: *** Current Time: Thu Aug 10 12:07:22 2023 Localwatchdog at: Thu Aug 17 10:13:22 2023
      [ 6806.003856] restraintd[1740]: *** Current Time: Thu Aug 10 12:08:22 2023 Localwatchdog at: Thu Aug 17 10:13:22 2023
      [ 6866.003735] restraintd[1740]: *** Current Time: Thu Aug 10 12:09:22 2023 Localwatchdog at: Thu Aug 17 10:13:22 2023
      [ 6883.618814] rcu: INFO: rcu_preempt self-detected stall on CPU
      [ 6883.624562] rcu: 7-....: (240003 ticks this GP) idle=349/1/0x4000000000000000 softirq=485496/485496 fqs=60006
      [ 6883.634644] (t=240269 jiffies g=5329081 q=2818 ncpus=48)
      [ 6883.640043] NMI backtrace for cpu 7
      [ 6883.643534] CPU: 7 PID: 11 Comm: kworker/u96:0 Kdump: loaded Tainted: G I -------- — 5.14.0-284.27.1.el9_2.x86_64 #1
      [ 6883.655607] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.12.2 07/09/2021
      [ 6883.663175] Workqueue: netns cleanup_net
      [ 6883.667099] Call Trace:
      [ 6883.669553] <IRQ>
      [ 6883.671574] dump_stack_lvl+0x34/0x48
      [ 6883.675239] nmi_cpu_backtrace.cold+0x30/0x6f
      [ 6883.679597] ? lapic_can_unplug_cpu+0x80/0x80
      [ 6883.683956] nmi_trigger_cpumask_backtrace+0xef/0x110
      [ 6883.689010] trigger_single_cpu_backtrace+0x2a/0x31
      [ 6883.693888] rcu_dump_cpu_stacks+0xa7/0xe4
      [ 6883.697990] print_cpu_stall.cold+0x4f/0x17d
      [ 6883.702262] check_cpu_stall+0xe9/0x240
      [ 6883.706102] rcu_pending+0x26/0x190
      [ 6883.709595] rcu_sched_clock_irq+0x3d/0x180
      [ 6883.713779] update_process_times+0x8c/0xc0
      [ 6883.717967] tick_sched_handle+0x22/0x60
      [ 6883.721891] tick_sched_timer+0x65/0x80
      [ 6883.725731] ? tick_sched_do_timer+0xa0/0xa0
      [ 6883.730006] __hrtimer_run_queues+0x127/0x2c0
      [ 6883.734363] hrtimer_interrupt+0xfc/0x210
      [ 6883.738376] __sysvec_apic_timer_interrupt+0x5c/0x110
      [ 6883.743429] sysvec_apic_timer_interrupt+0x6d/0x90
      [ 6883.748223] </IRQ>
      [ 6883.750328] <TASK>
      [ 6883.752435] asm_sysvec_apic_timer_interrupt+0x16/0x20
      [ 6883.757573] RIP: 0010:xas_start+0x29/0xd0
      [ 6883.761587] Code: 90 48 8b 57 18 48 89 d0 83 e0 03 74 6b 48 83 f8 02 75 09 48 81 fa 05 c0 ff ff 77 31 48 8b 07 48 8b 57 08 48 8b 40 08 48 89 c1 <83> e1 03 48 83 f9 02 75 08 48 3d 00 10 00 00 77 21 48 85 d2 75 29
      [ 6883.780331] RSP: 0018:ffffb4f0003c7d80 EFLAGS: 00000202
      [ 6883.785558] RAX: ffff8dab4dcea912 RBX: 0000000000000002 RCX: ffff8dab4dcea912
      [ 6883.792693] RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffffb4f0003c7d88
      [ 6883.799825] RBP: ffff8daa07ab0000 R08: 0000000000000001 R09: 0000000000000230
      [ 6883.806957] R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffffffffaadbf6d0
      [ 6883.814090] R13: ffff8db1c5cac100 R14: ffffffffade17da0 R15: ffffffffade17d54
      [ 6883.821224] ? xas_find+0x1c0/0x1c0
      [ 6883.824718] xa_get_mark+0x59/0x100
      [ 6883.828210] devlinks_xa_find_get.constprop.0+0x51/0x90
      [ 6883.833435] devlink_pernet_pre_exit+0x3c/0xf0
      [ 6883.837881] ? mutex_lock+0xe/0x30
      [ 6883.841286] cleanup_net+0x1d2/0x380
      [ 6883.844866] process_one_work+0x1e2/0x3b0
      [ 6883.848879] worker_thread+0x50/0x3a0
      [ 6883.852547] ? rescuer_thread+0x390/0x390
      [ 6883.856557] kthread+0xd6/0x100
      [ 6883.859704] ? kthread_complete_and_exit+0x20/0x20
      [ 6883.864497] ret_from_fork+0x1f/0x30
      [ 6883.868078] </TASK>

      Expected results:
      No kernel call trace.

      Additional info:
      job:
      https://beaker.engineering.redhat.com/jobs/8174634
      console log:
      https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/08/81746/8174634/14402292/console.log

              ovsdpdk-triage ovsdpdk triage
              tli@redhat.com Ting Li
              Ting Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: