-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-10.1
-
Yes
-
None
-
1
-
rhel-virt-hwe-arm-1
-
0
-
False
-
False
-
-
None
-
Split items
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
aarch64
-
None
What were you trying to do that didn't work?
Live migration of VM on nvidia-grace-hopper machine with Mellanox card (vfio).
Please be aware, that there already was same error on RHEL 9.6. caused by different FW version of card driver.
This time the card driver is same on both hosts
Please provide the package NVR for which the bug is seen:
libvirt libvirt-11.5.0-4.1.el10_1.aarch64
qemu-kvm qemu-kvm-10.0.0-14.el10_1.2.aarch64
kernel kernel-6.12.0-124.8.1.el10_1.aarch64
FW info
Image type: FS4
FW Version: 28.45.1200
FW Release Date: 12.5.2025
Product Version: 28.45.1200
Rom Info: type=UEFI version=14.38.16 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000834
How reproducible is this bug?:
100%
Steps to reproduce
- start the VM (attached) containing e.g.
<hostdev mode="subsystem" type="pci" managed="yes">
<driver name="vfio" model="mlx5_vfio_pci" />
<source>
<address domain="0x0000" bus="0x01" slot="0x00" function="0x2" />
</source>
<alias name="hostdev0" />
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0" />
</hostdev>
- run the migration
virsh -c 'qemu:///system' migrate --live --verbose --domain vm3 --desturi qemu+ssh://10.26.1.121/system
Expected results migration will pass
(previously passing tests:
https://libvirt-rhos-jenkins-product.hosted.upshift.rdu2.redhat.com/view/libvirt/view/RHEL-10.1/view/RHEL-10.1%20aarch64/job/libvirt-RHEL-10.1-runtest-aarch64-function-migration_modular_4/14/testReport/
Actual results: migration fails with
Command result:
Migration: [ 0.00 %]error: operation failed: migration failed. Message from the source host: operation failed: job 'migration out' failed: Sibling indicated error 1. Message from the destination host: operation failed: job 'migration in' failed: load of migration failed: Invalid argument
In from dest. avocado-vt-vm1.log
2025-11-13T10:51:06.361599Z qemu-kvm: error while loading state section id 55(0000:00:01.0:00.0/vfio)
2025-11-13 10:51:06.442+0000: shutting down, reason=failed
Check other tests failing
https://libvirt-rhos-jenkins-product.hosted.upshift.rdu2.redhat.com/job/libvirt-RHEL-10.1-runtest-aarch64-function-migration_modular_4/17/testReport/
listo of tests:
sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_no
sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.default
sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.mlx5_vfio
sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_no
sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.default
sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.mlx5_vfio
sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_no
sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.default
sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.mlx5_vfio
sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_no
sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.default
sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.mlx5_vfio
- relates to
-
RHEL-67996 [vfio migration][aarch64][4k] "qemu-kvm: error while loading state section id 55" reported when migrating a vm with mlx5_vfio_pci VF
-
- Closed
-