-
Task
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
-
False
-
-
rhel-10
-
None
-
rhel-net-ovs-dpdk
-
-
This ticket is tracking the QE verification effort for the solution to the problem described below.
Problem Description: Clearly explain the issue.
- Each pf create two vfs
- start openvswitch service
- reload vf from iavf to vfio-pci failed on stock kernel)
- on rt-kernel, the reload operation can be successd but add ovs-bond port failed.
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
Base on stock kernel test result, vf reload vfio-pci driver failed and driverctl commmand hung.
Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).
openvswitch version: openvswitch3.5-3.5.0-0.21.el10fdp.x86_64
driverctl-0.115-2.el10.noarch
nic firmware already update the newest.
[root@dell-per760-08 ~]# ethtool -i myeth_1 driver: i40e version: 6.12.0-55.el10.x86_64 firmware-version: 9.53 0x8000f92e 1.3755.0 expansion-rom-version: bus-info: 0000:b5:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes
kernel commandline
[root@dell-per760-08 ~]# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.12.0-55.el10.x86_64 root=/dev/mapper/rhel_dell--per760--08-root ro pci=realloc crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=ab9ddc95-b3e3-427e-b48e-12d6a9bbe1d8 rd.lvm.lv=rhel_dell-per760-08/root rd.lvm.lv=rhel_dell-per760-08/swap console=ttyS0,115200n81 default_hugepagesz=1G hugepagesz=1G hugepages=48 intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable
Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).
new issue on rhel10
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
100%
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
run below script
#!/bin/bash set -x nic_name1=myeth_1 nic_name2=myeth_2 vf1_name=enp181s0f0v0 vf2_name=enp181s0f1v0 vf1_pci=0000:b5:02.0 vf2_pci=0000:b5:0a.0 ovs-vsctl list bridge 2>/dev/null | grep name | awk '{ system("ovs-vsctl --if-exist del-br "$3" &>/dev/null") }' systemctl stop openvswitch &>/dev/null ip link set ${nic_name1} down ip link set ${nic_name2} down rm -rf /etc/openvswitch/*.db rm -rf /var/lib/openvswitch/* rm -rf /dev/hugepages/rtemap_* ip link set ${nic_name1} mtu 1500 ip link set ${nic_name2} mtu 1500 driverctl list-overrides 2>/dev/null | awk '{ system("driverctl unset-override "$1) }' echo 0 > /sys/class/net/${nic_name1}/device/sriov_numvfs echo 0 > /sys/class/net/${nic_name2}/device/sriov_numvfs lsmod | grep vfio_pci || modprobe vfio_pci ip link set ${nic_name1} up ip link set ${nic_name2} up ip link set ${nic_name1} mtu 9000 ip link set ${nic_name2} mtu 9000 echo 2 > /sys/class/net/${nic_name1}/device/sriov_numvfs ip link set ${nic_name1} vf 0 spoofchk off ip link set ${nic_name1} vf 0 trust on ip link show ${nic_name1} sleep 1 echo 2 > /sys/class/net/${nic_name2}/device/sriov_numvfs ip link set ${nic_name2} vf 0 spoofchk off ip link set ${nic_name2} vf 0 trust on ip link show ${nic_name2} sleep 1 systemctl restart openvswitch ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem='8192,8192' ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true sleep 5 dev_list=$(ls "/sys/bus/pci/devices/${vf1_pci}/iommu_group/devices") for i in $dev_list do driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci done dev_list=$(ls "/sys/bus/pci/devices/${vf2_pci}/iommu_group/devices") for i in $dev_list do driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci done
Expected Behavior: Describe what should happen under normal circumstances.
driverctl can success reload iavf driver to vfio-pci
Observed Behavior: Explain what actually happens.
driverctl command hung as below
+ driverctl -v set-override 0000:b5:02.0 vfio-pci
driverctl: setting driver override for 0000:b5:02.0: vfio-pci
driverctl: loading driver vfio-pci
driverctl: unbinding previous driver vfio-pci
Hit call trace as below
[ 855.571530] vfio-pci 0000:b5:02.0: Relaying device request to user (#50) [ 862.739668] INFO: task systemd-journal:1344 blocked for more than 491 seconds. [ 862.739671] Not tainted 6.12.0-55.el10.x86_64 #1 [ 862.739673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 862.739674] task:systemd-journal state:D stack:0 pid:1344 tgid:1344 ppid:1 flags:0x00000006 [ 862.739677] Call Trace: [ 862.739677] <TASK> [ 862.739678] __schedule+0x259/0x640 [ 862.739681] schedule+0x27/0xa0 [ 862.739684] schedule_preempt_disabled+0x15/0x30 [ 862.739686] __mutex_lock.constprop.0+0x3d0/0x6d0 [ 862.739688] uevent_show+0xa7/0x130 [ 862.739689] dev_attr_show+0x19/0x40 [ 862.739691] sysfs_kf_seq_show+0xa8/0xf0 [ 862.739693] seq_read_iter+0x11c/0x460 [ 862.739695] vfs_read+0x299/0x370 [ 862.739698] ksys_read+0x6d/0xf0 [ 862.739701] do_syscall_64+0x7d/0x160 [ 862.739704] ? __do_sys_newfstat+0x68/0x70 [ 862.739706] ? syscall_exit_to_user_mode+0x32/0x190 [ 862.739709] ? do_syscall_64+0x89/0x160 [ 862.739711] ? __x64_sys_openat+0x55/0xa0 [ 862.739713] ? syscall_exit_to_user_mode+0x32/0x190 [ 862.739715] ? do_syscall_64+0x89/0x160 [ 862.739718] ? avc_has_perm+0x5e/0xe0 [ 862.739720] ? from_kgid_munged+0x12/0x30 [ 862.739721] ? cp_new_stat+0x131/0x170 [ 862.739724] ? __memcg_slab_free_hook+0x100/0x150 [ 862.739726] ? __x64_sys_close+0x3c/0x80 [ 862.739728] ? kmem_cache_free+0x3ee/0x440 [ 862.739731] ? syscall_exit_to_user_mode+0x32/0x190 [ 862.739733] ? do_syscall_64+0x89/0x160 [ 862.739735] ? do_syscall_64+0x89/0x160 [ 862.739737] ? __x64_sys_openat+0x55/0xa0 [ 862.739739] ? syscall_exit_to_user_mode+0x32/0x190 [ 862.739741] ? do_syscall_64+0x89/0x160 [ 862.739743] ? exc_page_fault+0x73/0x160 [ 862.739746] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 862.739748] RIP: 0033:0x7f81a8320321 [ 862.739750] RSP: 002b:00007ffed5c0e5c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 862.739751] RAX: ffffffffffffffda RBX: 0000563ee47c86d0 RCX: 00007f81a8320321 [ 862.739752] RDX: 0000000000001008 RSI: 0000563ee47c86d0 RDI: 0000000000000016 [ 862.739753] RBP: 00007ffed5c0e6d0 R08: 0000000000000001 R09: 000000000000000f [ 862.739754] R10: 00000000000000ff R11: 0000000000000246 R12: 0000000000001008 [ 862.739755] R13: 0000000000000016 R14: 0000000000001008 R15: 00007ffed5c0e600 [ 862.739756] </TASK>
Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.
if I remove setup openvswitch commands, it can work well. Here is a worked script.
lsmod | grep vfio_pci || modprobe vfio_pci
ip link set ${nic_name1} up
ip link set ${nic_name2} up
ip link set ${nic_name1} mtu 9000
ip link set ${nic_name2} mtu 9000
echo 2 > /sys/class/net/${nic_name1}/device/sriov_numvfs
ip link set ${nic_name1} vf 0 spoofchk off
ip link set ${nic_name1} vf 0 trust on
ip link show ${nic_name1}
sleep 1
echo 2 > /sys/class/net/${nic_name2}/device/sriov_numvfs
ip link set ${nic_name2} vf 0 spoofchk off
ip link set ${nic_name2} vf 0 trust on
ip link show ${nic_name2}
sleep 1
dev_list=$(ls "/sys/bus/pci/devices/${vf1_pci}/iommu_group/devices")
for i in $dev_list
do
driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci
done
dev_list=$(ls "/sys/bus/pci/devices/${vf2_pci}/iommu_group/devices")
for i in $dev_list
do
driverctl -v list-overrides | grep "${i} vfio-pci" &>/dev/null || driverctl -v set-override ${i} vfio-pci
done
Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)
none