Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-128291

[ConnectX-7][SR-IOV] "qemu-kvm: error while loading state section id 55" reported when migrating a vm with mlx5_vfio_pci VF

Linking RHIVOS CVEs to...Migration: Automation ...RHELPRIO AssignedTeam ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Yes
    • None
    • 1
    • rhel-virt-hwe-arm-1
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Split items
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • aarch64
    • None

      What were you trying to do that didn't work?

      Live migration of VM on nvidia-grace-hopper machine with Mellanox card (vfio).
      Please be aware, that there already was same error on RHEL 9.6. caused by different FW version of card driver.

      This time the card driver is same on both hosts

      Please provide the package NVR for which the bug is seen:

      libvirt libvirt-11.5.0-4.1.el10_1.aarch64
      qemu-kvm qemu-kvm-10.0.0-14.el10_1.2.aarch64
      kernel kernel-6.12.0-124.8.1.el10_1.aarch64

      FW info

       
      Image type:            FS4
      FW Version:            28.45.1200
      FW Release Date:       12.5.2025
      Product Version:       28.45.1200
      Rom Info:              type=UEFI version=14.38.16 cpu=AMD64,AARCH64
                             type=PXE version=3.7.500 cpu=AMD64
      Description:           UID                GuidsNumber
      Image VSD:             N/A
      Device VSD:            N/A
      PSID:                  MT_0000000834
      
      

      How reproducible is this bug?:

      100%

      Steps to reproduce

      1. start the VM (attached) containing e.g.
          <hostdev mode="subsystem" type="pci" managed="yes">
            <driver name="vfio" model="mlx5_vfio_pci" />
            <source>
              <address domain="0x0000" bus="0x01" slot="0x00" function="0x2" />
            </source>
            <alias name="hostdev0" />
            <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0" />
          </hostdev>
      
      
      1. run the migration
        virsh -c 'qemu:///system' migrate --live --verbose --domain vm3 --desturi qemu+ssh://10.26.1.121/system

      Expected results migration will pass

      (previously passing tests:
      https://libvirt-rhos-jenkins-product.hosted.upshift.rdu2.redhat.com/view/libvirt/view/RHEL-10.1/view/RHEL-10.1%20aarch64/job/libvirt-RHEL-10.1-runtest-aarch64-function-migration_modular_4/14/testReport/

      Actual results: migration fails with

      Command result:

      Migration: [ 0.00 %]error: operation failed: migration failed. Message from the source host: operation failed: job 'migration out' failed: Sibling indicated error 1. Message from the destination host: operation failed: job 'migration in' failed: load of migration failed: Invalid argument
      

      In from dest. avocado-vt-vm1.log

      2025-11-13T10:51:06.361599Z qemu-kvm: error while loading state section id 55(0000:00:01.0:00.0/vfio)
      2025-11-13 10:51:06.442+0000: shutting down, reason=failed
      
      

      Check other tests failing
      https://libvirt-rhos-jenkins-product.hosted.upshift.rdu2.redhat.com/job/libvirt-RHEL-10.1-runtest-aarch64-function-migration_modular_4/17/testReport/

      listo of tests:

      sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_no
      sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.default
      sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.mlx5_vfio
      sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_no
      sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.default
      sriov.migration.vfio_variant_driver.non_p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.mlx5_vfio
      sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_no
      sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.default
      sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_device.managed_yes.mlx5_vfio
      sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_no
      sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.default
      sriov.migration.vfio_variant_driver.p2p_live.without_postcopy.without_iommu.single_iface.hostdev_interface.managed_yes.mlx5_vfio

              virt-maint virt-maint
              rh-ee-hholoubk Hana Holoubkova
              virt-maint virt-maint
              virt-bugs virt-bugs
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: