-
Bug
-
Resolution: Done
-
Normal
-
None
-
rhel-9.3.0
-
None
-
Moderate
-
rhel-virt-networking
-
ssg_virtualization
-
1
-
False
-
False
-
-
None
-
None
-
None
-
Automated
-
If docs needed, set a value
-
-
x86_64
-
None
Description of problem:
The domain with vhost-user interface + iommu throws "call trace" when running the netperf tests
Version-Release number of selected component (if applicable):
qemu-kvm-8.0.0-4.el9.x86_64
5.14.0-323.el9.x86_64
dpdk-22.11-3.el9_2.x86_64
openvswitch3.1-3.1.0-28.el9fdp.x86_64
How reproducible:
100%
Steps to Reproduce:
1. setup the host kernel option, like CPU isolation,huge-page, iommu
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` # echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11" >> /etc/tuned/cpu-partitioning-variables.conf tuned-adm profile cpu-partitioning # reboot
2. start a ovs-dpdk on the host
# echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages # echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages # modprobe vfio # modprobe vfio-pci # dpdk-devbind.py --bind=vfio-pci 0000:5e:00.0 # dpdk-devbind.py --bind=vfio-pci 0000:5e:00.1 ...
# ovs-vsctl get Open_vSwitch . other_config
{dpdk-init="true", dpdk-lcore-mask="0x2", dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x15554", vhost-iommu-support="true"}
# ovs-vsctl show
1e271d29-308d-4201-be11-d898617cc592
Bridge ovsbr0
datapath_type: netdev
Port ovsbr0
Interface ovsbr0
type: internal
Port dpdk0
Interface dpdk0
type: dpdk
options: {dpdk-devargs="0000:5e:00.0", n_rxq="2", n_txq="2"}
Port vhost-user0
Interface vhost-user0
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhostuser0.sock"}
Bridge ovsbr1
datapath_type: netdev
Port ovsbr1
Interface ovsbr1
type: internal
Port dpdk1
Interface dpdk1
type: dpdk
options: {dpdk-devargs="0000:5e:00.1", n_rxq="2", n_txq="2"}
Port vhost-user1
Interface vhost-user1
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhostuser1.sock"}
3. start a nfv virt domain with iommu device and vhost-user interfaces
<interface type='vhostuser'>
<mac address='18:66:da:5f:dd:22'/>
<source type='unix' path='/tmp/vhostuser0.sock' mode='server'/>
<target dev='vhost-user0'/>
<model type='virtio'/>
<driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
<alias name='net1'/>
</interface>
<iommu model='intel'>
<driver intremap='on' caching_mode='on' iotlb='on'/>
</iommu>
4. setup the kernel option in the domain
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` # echo "isolated_cores=1,2,3,4,5" >> /etc/tuned/cpu-partitioning-variables.conf # tuned-adm profile cpu-partitioning # reboot
5. run the netperf tests between the domain clinet and host server
(5.1) The host is the netperf server
# # ip addr add 192.168.1.3/24 dev ens3f1 # netserver
Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
(5.2)The domain is the netperf client:
# ip addr add 192.168.1.2/24 dev enp6s0 <-- the domain can ping the 192.168.1.3 successfully but with some package lost # netperf -H 192.168.1.3/24
6. check the domain dmesg
# dmesg [ 4802.234530] ------------[ cut here ]------------ [ 4802.234532] NETDEV WATCHDOG: enp6s0 (virtio_net): transmit queue 0 timed out [ 4802.234549] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x1f9/0x200 [ 4802.236690] Modules linked in: intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge ip_set stp llc iTCO_wdt rfkill iTCO_vendor_support nf_tables irqbypass nfnetlink rapl virtio_balloon i2c_i801 i2c_smbus lpc_ich qrtr pcspkr vfat fat drm fuse xfs libcrc32c ahci libahci nvme_tcp nvme_fabrics nvme libata nvme_core nvme_common t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod [ 4802.243011] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.14.0-323.el9.x86_64 #1 [ 4802.243900] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20230301gitf80f052277c8-5.el9 03/01/2023 [ 4802.244809] RIP: 0010:dev_watchdog+0x1f9/0x200 [ 4802.245284] Code: 00 e9 40 ff ff ff 48 89 ef c6 05 03 af 7a 01 01 e8 3c c5 fa ff 44 89 e9 48 89 ee 48 c7 c7 a0 b1 6d 97 48 89 c2 e8 17 82 77 ff <0f> 0b e9 22 ff ff ff 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f 18 0f [ 4802.247210] RSP: 0018:ffffb32980003eb0 EFLAGS: 00010286 [ 4802.247766] RAX: 0000000000000000 RBX: ffff99428b8ff488 RCX: 0000000000000027 [ 4802.248511] RDX: 0000000000000027 RSI: ffffffff97e67460 RDI: ffff994337c1f8c8 [ 4802.249262] RBP: ffff99428b8ff000 R08: ffff994337c1f8c0 R09: 0000000000000000 [ 4802.250016] R10: ffffffffffffffff R11: ffffffff98b6f070 R12: ffff99428b8ff3dc [ 4802.250766] R13: 0000000000000000 R14: ffffffff96b7e5b0 R15: ffffb32980003f08 [ 4802.251516] FS: 0000000000000000(0000) GS:ffff994337c00000(0000) knlGS:0000000000000000 [ 4802.252364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4802.252977] CR2: 00007ffe7a749000 CR3: 0000000101d54004 CR4: 0000000000770ef0 [ 4802.253732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4802.254479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4802.255230] PKRU: 55555554 [ 4802.255532] Call Trace: [ 4802.255805] <IRQ> [ 4802.256031] ? pfifo_fast_change_tx_queue_len+0x70/0x70 [ 4802.256586] call_timer_fn+0x24/0x130 [ 4802.256986] __run_timers.part.0+0x1ee/0x280 [ 4802.257444] ? enqueue_hrtimer+0x2f/0x80 [ 4802.257870] ? __hrtimer_run_queues+0x159/0x2c0 [ 4802.258358] run_timer_softirq+0x26/0x50 [ 4802.258785] __do_softirq+0xc7/0x2ac [ 4802.259173] __irq_exit_rcu+0xb9/0xf0 [ 4802.259573] sysvec_apic_timer_interrupt+0x72/0x90 [ 4802.260084] </IRQ> [ 4802.260318] <TASK> [ 4802.260559] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 4802.261103] RIP: 0010:default_idle+0x10/0x20 [ 4802.261571] Code: 8b 04 25 40 ef 01 00 f0 80 60 02 df c3 cc cc cc cc 0f ae 38 eb bb 0f 1f 40 00 0f 1f 44 00 00 66 90 0f 00 2d be da 47 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65 [ 4802.263496] RSP: 0018:ffffffff97e03ea8 EFLAGS: 00000252 [ 4802.264050] RAX: ffffffff96d8d320 RBX: ffffffff97e1a940 RCX: 0000000000000000 [ 4802.264803] RDX: 4000000000000000 RSI: ffff994337c22b20 RDI: 000000000497eebc [ 4802.265554] RBP: 0000000000000000 R08: 0000045e163d1cbb R09: ffff9941d6202400 [ 4802.266301] R10: 0000000000020604 R11: 0000000000000000 R12: 0000000000000000 [ 4802.267054] R13: 000000006dc53d18 R14: 000000006d3c47a8 R15: 000000006d3c47b0 [ 4802.267810] ? mwait_idle+0x70/0x70 [ 4802.268189] default_idle_call+0x33/0xe0 [ 4802.268615] cpuidle_idle_call+0x125/0x160 [ 4802.269051] ? kvm_sched_clock_read+0x14/0x30 [ 4802.269519] do_idle+0x78/0xe0 [ 4802.269891] cpu_startup_entry+0x19/0x20 [ 4802.270311] rest_init+0xca/0xd0 [ 4802.270671] arch_call_rest_init+0xa/0x14 [ 4802.271099] start_kernel+0x4a3/0x4c2 [ 4802.271495] secondary_startup_64_no_verify+0xe5/0xeb [ 4802.272037] </TASK> [ 4802.272279] ---[ end trace 87fb221169225dfd ]--
Besides above "Call Trace" , the domain will keep throwing the info like" virtio_net virtio3 enp6s0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 7820000 usecs ago"
7. run the ping tests
# ping 192.168.1.3 PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data. ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ...
Actual results:
The domain with vhost-user interface + iommu throws "call trace" when running the netperf tests
Expected results:
No Call Trace
Additional info: