-
Bug
-
Resolution: Done
-
Undefined
-
None
-
rhel-9.6
-
No
-
None
-
rhel-sst-virtualization
-
ssg_virtualization
-
3
-
False
-
-
None
-
None
-
None
-
None
-
-
aarch64
-
None
What were you trying to do that didn't work?
Migrating a vm with a mlx5_vfio_pci VF reports below error:
Migration: [ 0.43 %]error: internal error: QEMU unexpectedly closed the monitor (vm='avocado-vt-vm1'): 2024-11-19T01:19:25.726386Z qemu-kvm: error while loading state section id 55(0000:00:01.0:00.0/vfio) 2024-11-19T01:19:25.726730Z qemu-kvm: load of migration failed: Invalid argument
Please provide the package NVR for which the bug is seen:
libvirt-10.9.0-1.el9.aarch64
qemu-kvm-9.1.0-2.el9.aarch64
edk2-aarch64-20240524-9.el9.noarch
kernel-5.14.0-528.el9.aarch64
source host: nvidia-grace-hopper-06
source host' FW version: 28.43.1014
source iface: 0000:01:00.1 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
destination host: nvidia-grace-hopper-09
destination host's FW version: 28.39.1002
destination iface: 0000:01:00.1 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
How reproducible is this bug?:
100%
Steps to reproduce
- Enable 2 vfs and set them to migratable on both source and destination host(refer to polarion case VIRT-299412)
mlxconfig -d 0000:01:00.1 query VF_MIGRATION_MODE echo 2 > /sys/devices/pci0000:00/0000:00:00.0/0000:01:00.1/sriov_numvfs ip link set enp1s0f1np1 vf 0 mac 52:54:00:01:01:01 echo 0000:01:02.2 > /sys/bus/pci/drivers/mlx5_core/unbind echo 0000:01:02.3 > /sys/bus/pci/drivers/mlx5_core/unbind devlink dev eswitch set pci/0000:01:00.1 mode switchdev devlink dev eswitch show pci/0000:01:00.1 devlink port devlink port function set pci/0000:01:00.1/65537 migratable enable devlink port function set pci/0000:01:00.1/65538 migratable enable devlink port modprobe mlx5_vfio_pci virsh nodedev-detach pci_0000_01_02_2 --driver mlx5_vfio_pci virsh nodedev-detach pci_0000_01_02_3 --driver mlx5_vfio_pci
- Start a vm with mlx5_vfio_pci VF
<hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x02' function='0x2'/> </source> <alias name='ua-1bcbabff-f022-4d4f-ae8c-13f2d3a07906'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </hostdev>
- virsh start <vm>
- virsh migrate --live --verbose --domain <vm> --desturi qemu+tcp://<dest ip>/system
Expected results
VM should be migrated to destination host.
Actual results
It reports an erorr:
# virsh migrate --live --verbose --domain avocado-vt-vm1 --desturi qemu+tcp://10.26.1.121/system Migration: [ 0.43 %]error: internal error: QEMU unexpectedly closed the monitor (vm='avocado-vt-vm1'): 2024-11-19T01:19:25.726386Z qemu-kvm: error while loading state section id 55(0000:00:01.0:00.0/vfio)
tail -f avocado-vt-vm1.log:
2024-11-19T02:14:25.206085Z qemu-kvm: failed to save SaveStateEntry with id(name): 3(ram): -5 2024-11-19T02:14:25.268250Z qemu-kvm: Unable to shutdown socket: Transport endpoint is not connected 2024-11-19T02:14:25.268276Z qemu-kvm: Sibling indicated error 1
dmesg on destination host:
# [ 4429.528363] mlx5_vfio_pci 0000:01:02.2: enabli ng device (0000 -> 0002) [ 4429.998017] mlx5_core 0000:01:00.1: mlx5_cmd_out_err:808:(pid 28296): LOAD_VHCA _STATE(0x119) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xe9ecae), err(-22)