-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
rhel-9.4
-
None
-
Low
-
rhel-sst-virtualization
-
ssg_virtualization
-
5
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
When post-copy migrate the RT VM which has vhost-user interfaces, QEMU threads race while accessing RT VM's vhost-user device.
Please provide the package NVR for which bug is seen:
host:
5.14.0-426.el9.x86_64
qemu-kvm-8.2.0-6.el9.x86_64
libvirt-10.0.0-4.el9.x86_64
How reproducible:
The reproducer ratio < 10%
Steps to reproduce
1. create a ovs-dpdk
2. start a rt-VM with three 1Q vhost-user interfaces
3. post-copy migrate the VM
# /bin/virsh migrate --verbose --persistent --postcopy --live rhel9.4 qemu+ssh://192.168.1.2/system # /bin/virsh migrate-postcopy rhel9.4
or
# /bin/virsh migrate --verbose --persistent --postcopy --timeout 3 --timeout-postcopy --live rhel9.4 qemu+ssh://192.168.1.2/system
4. repeat the post-copy migration multiple times
Expected results
The RT VM's post-copy migration always works and the qemu-kvm doesn't throw any error
Actual results
The RT VM's post-copy migration will fail sometimes (In my case, the repeated post-copy migration fails at the 40th time)
2024-03-03T08:02:51.622917Z qemu-kvm: Received unexpected msg type. Expected 22 received 30 2024-03-03T08:02:51.622920Z 2024-03-03T08:02:51.622954Z qemu-kvm: Failed to read msg header. Flags 0x0 instead of 0x5. qemu-kvm: 2024-03-03T08:02:51.622979Z Fail to update device iotlb qemu-kvm: Failed to receive reply to postcopy_end 2024-03-03T08:02:51.631614Z qemu-kvm: Failed to read msg header. Flags 0x0 instead of 0x5. 2024-03-03T08:02:51.631632Z qemu-kvm: Fail to update device iotlb 2024-03-03T08:02:51.633570Z qemu-kvm: Failed to read msg header. Flags 0x8 instead of 0x5. 2024-03-03T08:02:51.633587Z qemu-kvm: Fail to update device iotlb 2024-03-03T08:02:51.646651Z qemu-kvm: Failed to read msg header. Flags 0x16 instead of 0x5. 2024-03-03T08:02:51.646676Z qemu-kvm: Fail to update device iotlb 2024-03-03T08:02:51.658721Z qemu-kvm: Failed to read msg header. Flags 0x0 instead of 0x5. 2024-03-03T08:02:51.658736Z qemu-kvm: Fail to update device iotlb
Additional info:
The script I used to create ovs-dpdk
modprobe -r openvswitch modprobe openvswitch /usr/bin/ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema /usr/sbin/ovsdb-server --remote=punix:/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file /usr/bin/ovs-vsctl --no-wait init /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem='1024,1024' /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask='0x1' /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-postcopy-support=true /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true /usr/sbin/ovs-vswitchd unix:/var/run/openvswitch/db.sock --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log /usr/bin/ovs-vsctl --if-exists del-br ovsbr0 /usr/bin/ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev /usr/bin/ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 options:n_rxq=1 options:n_txq=1 /usr/bin/ovs-vsctl add-port ovsbr0 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:5e:00.1 options:n_rxq=1 options:n_txq=1 /usr/bin/ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock /usr/bin/ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock /usr/bin/ovs-ofctl del-flows ovsbr0 /usr/bin/ovs-ofctl add-flow ovsbr0 'in_port=1,idle_timeout=0 actions=output:3' /usr/bin/ovs-ofctl add-flow ovsbr0 'in_port=3,idle_timeout=0 actions=output:1' /usr/bin/ovs-ofctl add-flow ovsbr0 'in_port=2,idle_timeout=0 actions=output:4' /usr/bin/ovs-ofctl add-flow ovsbr0 'in_port=4,idle_timeout=0 actions=output:2' /usr/bin/ovs-vsctl --if-exists del-br ovsbr1 /usr/bin/ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev /usr/bin/ovs-vsctl add-port ovsbr1 dpdk2 -- set Interface dpdk2 type=dpdk options:dpdk-devargs=0000:60:00.0 options:n_rxq=1 options:n_txq=1 /usr/bin/ovs-vsctl add-port ovsbr1 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser2.sock /usr/bin/ovs-ofctl del-flows ovsbr1 /usr/bin/ovs-ofctl add-flow ovsbr1 'in_port=1,idle_timeout=0 actions=output:2' /usr/bin/ovs-ofctl add-flow ovsbr1 'in_port=2,idle_timeout=0 actions=output:1' /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0x15554 /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
The full VM xml
<domain type='kvm'> <name>rhel9.4</name> <uuid>fa5a02ba-e699-11ee-90a1-a0369fc7bbea</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> <locked/> </memoryBacking> <vcpu placement='static'>6</vcpu> <cputune> <vcpupin vcpu='0' cpuset='22'/> <vcpupin vcpu='1' cpuset='32'/> <vcpupin vcpu='2' cpuset='30'/> <vcpupin vcpu='3' cpuset='28'/> <vcpupin vcpu='4' cpuset='26'/> <vcpupin vcpu='5' cpuset='24'/> <emulatorpin cpuset='3,5,7,9'/> <emulatorsched scheduler='fifo' priority='1'/> <vcpusched vcpus='0' scheduler='fifo' priority='1'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> <vcpusched vcpus='2' scheduler='fifo' priority='1'/> <vcpusched vcpus='3' scheduler='fifo' priority='1'/> <vcpusched vcpus='4' scheduler='fifo' priority='1'/> <vcpusched vcpus='5' scheduler='fifo' priority='1'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> <memnode cellid='0' mode='strict' nodeset='0'/> </numatune> <os firmware='efi'> <type arch='x86_64' machine='pc-q35-rhel9.4.0'>hvm</type> <firmware> <feature enabled='no' name='enrolled-keys'/> <feature enabled='yes' name='secure-boot'/> </firmware> <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader> <nvram template='/usr/share/edk2/ovmf/OVMF_VARS.fd'>/var/lib/libvirt/qemu/nvram/rhel9.4_VARS.fd</nvram> <boot dev='hd'/> </os> <features> <acpi/> <pmu state='off'/> <vmport state='off'/> <smm state='on'/> <ioapic driver='qemu'/> </features> <cpu mode='host-model' check='partial'> <topology sockets='3' dies='1' clusters='1' cores='1' threads='2'/> <feature policy='require' name='tsc-deadline'/> <numa> <cell id='0' cpus='0-5' memory='8388608' unit='KiB' memAccess='shared'/> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='threads' iommu='on' ats='on'/> <source file='/mnt/nfv//rhel9.4.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </disk> <controller type='usb' index='0' model='none'/> <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x10'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0x11'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0x12'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x13'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> </controller> <controller type='pci' index='5' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='5' port='0x14'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/> </controller> <controller type='pci' index='6' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='6' port='0x15'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/> </controller> <controller type='pci' index='7' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='7' port='0x16'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/> </controller> <controller type='pci' index='8' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='8' port='0x17'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <interface type='bridge'> <mac address='28:66:da:5f:dd:01'/> <source bridge='switch'/> <model type='virtio'/> <driver name='vhost' iommu='on' ats='on'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:5f:dd:02'/> <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024' tx_queue_size='1024' iommu='on' ats='on'> <host mrg_rxbuf='on'/> </driver> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:5f:dd:03'/> <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024' tx_queue_size='1024' iommu='on' ats='on'> <host mrg_rxbuf='on'/> </driver> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:5f:dd:04'/> <source type='unix' path='/tmp/vhostuser2.sock' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024' tx_queue_size='1024' iommu='on' ats='on'> <host mrg_rxbuf='on'/> </driver> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <tpm model='tpm-crb'> <backend type='emulator' version='2.0'/> </tpm> <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> <listen type='address' address='0.0.0.0'/> </graphics> <audio id='1' type='none'/> <video> <model type='cirrus' vram='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </video> <watchdog model='itco' action='reset'/> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> <driver iommu='on' ats='on'/> </memballoon> <iommu model='intel'> <driver intremap='on' caching_mode='on' iotlb='on'/> </iommu> </devices> </domain>
- impacts account
-
RHEL-41264 virsh may request switching to post-copy before migration has started when --timeout-postcopy is used with a short timeout
- Planning