-
Bug
-
Resolution: Done
-
Normal
-
None
-
rhel-9.4
-
None
-
Moderate
-
rhel-sst-virtualization
-
ssg_virtualization
-
None
-
QE ack
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
RegressionOnly
-
-
x86_64
-
None
What were you trying to do that didn't work?
During the mlx vfio migration, the Total downtime value displayed is much larger than the actual downtime.
Please provide the package NVR for which bug is seen:
host:
5.14.0-378.el9.x86_64
qemu-kvm-8.1.0-1.el9.x86_64
libvirt-9.5.0-7.el9_3.x86_64
VM:
5.14.0-378.el9.x86_64
How reproducible:
100%
Steps to reproduce
1. create a MT2910 VF and setup the VF for migration
2. start a Q35 + SEABIOS VM with a mlx5_vfio_pci VF
3. migrate the VM
# /bin/virsh migrate --live --domain rhel94 --desturi qemu+ssh://10.73.212.96/system
Note:
Command '/bin/virsh migrate --live --domain rhel94 --desturi qemu+ssh://10.73.212.96/system' finished after 7.491417407989502s
4. check the migration info
# /bin/virsh domjobinfo rhel94 --completed Job type: Completed Operation: Outgoing migration Time elapsed: 6561 ms Data processed: 505.259 MiB Data remaining: 0.000 B Data total: 4.016 GiB Memory processed: 505.259 MiB Memory remaining: 0.000 B Memory total: 4.016 GiB Memory bandwidth: 94.846 MiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 5 Postcopy requests: 0 Constant pages: 927068 Normal pages: 127061 Normal data: 496.332 MiB Total downtime: 5820289 ms <-- The Total downtime value is wrong Downtime w/o network: 216 ms Setup time: 44 ms
# virsh qemu-monitor-command --hmp rhel94 "info migrate" globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: device <-- The "Migration status" is "device" but not "completed" total time: 5318 ms expected downtime: 300 ms < -- I can not see the "downtime" option via "info migrate" setup: 44 ms transferred ram: 517385 kbytes throughput: 841.13 mbps remaining ram: 0 kbytes total ram: 4211528 kbytes duplicate: 927068 pages skipped: 0 pages normal: 127061 pages normal bytes: 508244 kbytes dirty sync count: 5 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 25619 dirty pages rate: 183 pages precopy ram: 515533 kbytes downtime ram: 1851 kbytes vfio device transferred: 4880 kbytes
Expected results
The downtime value displayed is wrong
Actual results
The downtime value displayed is correct
Additional info:
(1) The mellanox CX-7 device I used:
# flint -d 0000:22:00.0 query full Image type: FS4 FW Version: 28.38.1002 FW Release Date: 3.8.2023 Part Number: MCX75310AAS-HEA_Ax Description: NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE / NDR200 IB (default mode); Single-port OSFP; PCIe 5.0 x16; Crypto Disabled; Secure Boot Enabled; Product Version: 28.38.1002 Rom Info: type=UEFI version=14.31.20 cpu=AMD64,AARCH64 type=PXE version=3.7.201 cpu=AMD64 Description: UID GuidsNumber Base GUID: 946dae03001db182 2 Base MAC: 946dae1db182 2 Image VSD: N/A Device VSD: N/A PSID: MT_0000000844 Security Attributes: secure-fw Default Update Method: fw_ctrl Life cycle: GA SECURED Secure Boot Capable: Enabled EFUSE Security Ver: 0 Image Security Ver: 0 Security Ver Program: Manually ; Disabled Encryption: Enabled
(2) How to create a MT2910 VF and setup the VF for migration
1.1 load the mlx5_vfio_pci module # modprobe mlx5_vfio_pci 1.2 create VF # sudo sh -c "echo 0 > /sys/bus/pci/devices/0000:b1:00.0/sriov_numvfs" # sudo sh -c "echo 1 > /sys/bus/pci/devices/0000:b1:00.0/sriov_numvfs" 1.3 set VF mac # sudo sh -c "ip link set ens2f0np0 vf 0 mac 52:54:00:01:01:01" 1.4 unbind created VF from driver # sudo sh -c "echo 0000:b1:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind" 1.5 set switchdev mode on PF # sudo sh -c "devlink dev eswitch set pci/0000:b1:00.0 mode switchdev" # sudo sh -c "devlink dev eswitch show pci/0000:b1:00.0" pci/0000:b1:00.0: mode switchdev inline-mode none encap-mode basic 1.6 enable VF's migration feature # sudo sh -c "devlink port function set pci/0000:b1:00.0/1 migratable enable" # sudo sh -c "devlink port show pci/0000:b1:00.0/1" … function: hw_addr 52:54:00:01:01:01 roce enable migratable enable 1.7 bind VF to mlx5_vfio_pci driver # sudo sh -c "echo '15b3 101e' > /sys/bus/pci/drivers/mlx5_vfio_pci/new_id" # sudo sh -c "echo '15b3 101e' > /sys/bus/pci/drivers/mlx5_vfio_pci/remove_id" # readlink -f /sys/bus/pci/devices/0000\:b1\:00.2/driver /sys/bus/pci/drivers/mlx5_vfio_pci