Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2549

Test Coverage: [x710]vf reload driver failed after start openvswitch

    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      ( ) The test coverage is aligned with the epic's acceptance criteria

      Given a system administrator runs OVS-DPDK with x710 NICs and uses driverctl to reload VF drivers from iavf to vfio-pci,

      When executing driver reload operations post-OVS initialization,

      Then VF driver reload succeeds without hanging, and VF interfaces can be immediately used by OVS-DPDK

      Show
      ( ) The test coverage is aligned with the epic's acceptance criteria Given a system administrator runs OVS-DPDK with x710 NICs and uses driverctl to reload VF drivers from iavf to vfio-pci, When executing driver reload operations post-OVS initialization, Then VF driver reload succeeds without hanging, and VF interfaces can be immediately used by OVS-DPDK
    • rhel-10
    • None
    • rhel-net-ovs-dpdk

      This task is tracking the test case writing activities to cover the bug described below.

       Problem Description: Clearly explain the issue.

      1. Each pf create two vfs
      2. start openvswitch service
      3. reload vf from iavf to vfio-pci failed on stock kernel)
      4. on rt-kernel, the reload operation can be successd but add ovs-bond port failed.

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      Base on stock kernel test result, vf reload vfio-pci driver failed and driverctl commmand hung.

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      openvswitch version: openvswitch3.5-3.5.0-0.21.el10fdp.x86_64

      driverctl-0.115-2.el10.noarch

      nic firmware already update the newest.

      [root@dell-per760-08 ~]# ethtool -i myeth_1
      driver: i40e
      version: 6.12.0-55.el10.x86_64
      firmware-version: 9.53 0x8000f92e 1.3755.0
      expansion-rom-version: 
      bus-info: 0000:b5:00.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: yes
      supports-register-dump: yes
      supports-priv-flags: yes
       

      kernel commandline

      [root@dell-per760-08 ~]# cat /proc/cmdline 
      BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.12.0-55.el10.x86_64 root=/dev/mapper/rhel_dell--per760--08-root ro pci=realloc crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=ab9ddc95-b3e3-427e-b48e-12d6a9bbe1d8 rd.lvm.lv=rhel_dell-per760-08/root rd.lvm.lv=rhel_dell-per760-08/swap console=ttyS0,115200n81 default_hugepagesz=1G hugepagesz=1G hugepages=48 intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable 

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      new issue on rhel10

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      100%

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      run below script

      #!/bin/bash
      set -x
      nic_name1=myeth_1
      nic_name2=myeth_2
      vf1_name=enp181s0f0v0
      vf2_name=enp181s0f1v0
      vf1_pci=0000:b5:02.0
      vf2_pci=0000:b5:0a.0
      ovs-vsctl list bridge 2>/dev/null | grep name | awk '{
       system("ovs-vsctl --if-exist del-br "$3" &>/dev/null")
      }'
      systemctl stop openvswitch &>/dev/null
      ip link set ${nic_name1} down
      ip link set ${nic_name2} down
      rm -rf /etc/openvswitch/*.db
      rm -rf /var/lib/openvswitch/*
      rm -rf /dev/hugepages/rtemap_*
      ip link set ${nic_name1} mtu 1500
      ip link set ${nic_name2} mtu 1500
      driverctl list-overrides 2>/dev/null | awk '{ system("driverctl unset-override "$1) }'
      echo 0 > /sys/class/net/${nic_name1}/device/sriov_numvfs
      echo 0 > /sys/class/net/${nic_name2}/device/sriov_numvfs
      lsmod | grep vfio_pci || modprobe vfio_pci
      ip link set ${nic_name1} up
      ip link set ${nic_name2} up
      ip link set ${nic_name1} mtu 9000
      ip link set ${nic_name2} mtu 9000
      echo 2 > /sys/class/net/${nic_name1}/device/sriov_numvfs
      ip link set ${nic_name1} vf 0 spoofchk off
      ip link set ${nic_name1} vf 0 trust on
      ip link show ${nic_name1}
      sleep 1
      echo 2 > /sys/class/net/${nic_name2}/device/sriov_numvfs
      ip link set ${nic_name2} vf 0 spoofchk off
      ip link set ${nic_name2} vf 0 trust on
      ip link show ${nic_name2}
      sleep 1
      systemctl restart openvswitch
      ovs-vsctl set Open_vSwitch . other_config={}
      ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem='8192,8192'
      ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true
      ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
      sleep 5
      dev_list=$(ls "/sys/bus/pci/devices/${vf1_pci}/iommu_group/devices")
      for i in $dev_list
      do
              driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci
      done
      dev_list=$(ls "/sys/bus/pci/devices/${vf2_pci}/iommu_group/devices")
      for i in $dev_list
      do
              driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci
      done

       Expected Behavior: Describe what should happen under normal circumstances.

      driverctl can success reload iavf driver to vfio-pci 

       Observed Behavior: Explain what actually happens.

      driverctl command hung as below

      + driverctl -v set-override 0000:b5:02.0 vfio-pci
      driverctl: setting driver override for 0000:b5:02.0: vfio-pci
      driverctl: loading driver vfio-pci
      driverctl: unbinding previous driver vfio-pci
       

      Hit call trace as below

      [  855.571530] vfio-pci 0000:b5:02.0: Relaying device request to user (#50)
      [  862.739668] INFO: task systemd-journal:1344 blocked for more than 491 seconds.
      [  862.739671]       Not tainted 6.12.0-55.el10.x86_64 #1
      [  862.739673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  862.739674] task:systemd-journal state:D stack:0     pid:1344  tgid:1344  ppid:1      flags:0x00000006
      [  862.739677] Call Trace:
      [  862.739677]  <TASK>
      [  862.739678]  __schedule+0x259/0x640
      [  862.739681]  schedule+0x27/0xa0
      [  862.739684]  schedule_preempt_disabled+0x15/0x30
      [  862.739686]  __mutex_lock.constprop.0+0x3d0/0x6d0
      [  862.739688]  uevent_show+0xa7/0x130
      [  862.739689]  dev_attr_show+0x19/0x40
      [  862.739691]  sysfs_kf_seq_show+0xa8/0xf0
      [  862.739693]  seq_read_iter+0x11c/0x460
      [  862.739695]  vfs_read+0x299/0x370
      [  862.739698]  ksys_read+0x6d/0xf0
      [  862.739701]  do_syscall_64+0x7d/0x160
      [  862.739704]  ? __do_sys_newfstat+0x68/0x70
      [  862.739706]  ? syscall_exit_to_user_mode+0x32/0x190
      [  862.739709]  ? do_syscall_64+0x89/0x160
      [  862.739711]  ? __x64_sys_openat+0x55/0xa0
      [  862.739713]  ? syscall_exit_to_user_mode+0x32/0x190
      [  862.739715]  ? do_syscall_64+0x89/0x160
      [  862.739718]  ? avc_has_perm+0x5e/0xe0
      [  862.739720]  ? from_kgid_munged+0x12/0x30
      [  862.739721]  ? cp_new_stat+0x131/0x170
      [  862.739724]  ? __memcg_slab_free_hook+0x100/0x150
      [  862.739726]  ? __x64_sys_close+0x3c/0x80
      [  862.739728]  ? kmem_cache_free+0x3ee/0x440
      [  862.739731]  ? syscall_exit_to_user_mode+0x32/0x190
      [  862.739733]  ? do_syscall_64+0x89/0x160
      [  862.739735]  ? do_syscall_64+0x89/0x160
      [  862.739737]  ? __x64_sys_openat+0x55/0xa0
      [  862.739739]  ? syscall_exit_to_user_mode+0x32/0x190
      [  862.739741]  ? do_syscall_64+0x89/0x160
      [  862.739743]  ? exc_page_fault+0x73/0x160
      [  862.739746]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [  862.739748] RIP: 0033:0x7f81a8320321
      [  862.739750] RSP: 002b:00007ffed5c0e5c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  862.739751] RAX: ffffffffffffffda RBX: 0000563ee47c86d0 RCX: 00007f81a8320321
      [  862.739752] RDX: 0000000000001008 RSI: 0000563ee47c86d0 RDI: 0000000000000016
      [  862.739753] RBP: 00007ffed5c0e6d0 R08: 0000000000000001 R09: 000000000000000f
      [  862.739754] R10: 00000000000000ff R11: 0000000000000246 R12: 0000000000001008
      [  862.739755] R13: 0000000000000016 R14: 0000000000001008 R15: 00007ffed5c0e600
      [  862.739756]  </TASK>
       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

      if I remove setup openvswitch commands, it can work well. Here is a worked script.

      lsmod | grep vfio_pci || modprobe vfio_pci
      ip link set ${nic_name1} up
      ip link set ${nic_name2} up
      ip link set ${nic_name1} mtu 9000
      ip link set ${nic_name2} mtu 9000
      echo 2 > /sys/class/net/${nic_name1}/device/sriov_numvfs
      ip link set ${nic_name1} vf 0 spoofchk off
      ip link set ${nic_name1} vf 0 trust on
      ip link show ${nic_name1}
      sleep 1
      echo 2 > /sys/class/net/${nic_name2}/device/sriov_numvfs
      ip link set ${nic_name2} vf 0 spoofchk off
      ip link set ${nic_name2} vf 0 trust on
      ip link show ${nic_name2}
      sleep 1
      dev_list=$(ls "/sys/bus/pci/devices/${vf1_pci}/iommu_group/devices")
      for i in $dev_list
      do
          driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci
      done
      dev_list=$(ls "/sys/bus/pci/devices/${vf2_pci}/iommu_group/devices")
      for i in $dev_list
      do
          driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci
      done 

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      none

              ovsdpdk-triage ovsdpdk triage
              nstbot NST Bot
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: