Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7115

Recovering postcopy before network issue is resolved leads to wrong qemu migration status

    • qemu-kvm-8.2.0-1.el9
    • Normal
    • CustomerScenariosInitiative
    • sst_virtualization
    • ssg_virtualization
    • False
    • Hide

      None

      Show
      None
    • Known Issue
    • Hide
      .Resuming a postcopy VM migration fails in some cases

      Currently, when performing a postcopy migration of a virtual machine (VM), if a proxy network failure occurs during the RECOVER phase of the migration, the VM becomes unresponsive and the migration cannot be resumed. Instead, the recovery command displays the following error:

      ----
      error: Requested operation is not valid: QEMU reports migration is still running
      ----
      Show
      .Resuming a postcopy VM migration fails in some cases Currently, when performing a postcopy migration of a virtual machine (VM), if a proxy network failure occurs during the RECOVER phase of the migration, the VM becomes unresponsive and the migration cannot be resumed. Instead, the recovery command displays the following error: ---- error: Requested operation is not valid: QEMU reports migration is still running ----
    • Done

      Description of problem:
      Do postcopy migration with unix+proxy transport, break proxy, postcopy migration failed. Try to recover migration before proxy is fixed, it failed as expected. Then fix the proxy, and try to recover migration again, it still failed ans said:
      "error: Requested operation is not valid: QEMU reports migration is still running"

      Version-Release number of selected component (if applicable):
      libvirt-8.5.0-2.el9.x86_64
      qemu-kvm-7.0.0-9.el9.x86_64

      How reproducible:
      100%

      Steps to Reproduce:
      1.Start a vm

      2. Set up proxy between src and dest host
      1) On dest host:

      1. socat tcp-listen:22222,reuseaddr,fork unix:/var/run/libvirt/virtqemud-sock
      2. socat tcp-listen:33333,reuseaddr,fork unix:/tmp/33333-sock
        2) On src host:
      3. socat unix-listen:/tmp/sock,reuseaddr,fork tcp:<dest_host>:22222
      4. socat unix-listen:/tmp/33333-sock,reuseaddr,fork tcp:<dest_host>:33333

      2.Migrate vm to other host with unix transport:

      1. virsh migrate uefi qemu+unix:///system?socket=/tmp/sock --live --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3 --migrateuri unix:///tmp/33333-sock

      3.Switch migration to postcopy

      1. virsh migrate-postcopy uefi

      4. Break this proxy, migration will fail immediately:

      1. socat tcp-listen:33333,reuseaddr,fork unix:/tmp/33333-sock

      5. Try to recover postcopy migration, it failed as expected:

      1. virsh migrate uefi qemu+unix:///system?socket=/tmp/sock --live --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3 --migrateuri unix:///tmp/33333-sock --postcopy-resume
        error: operation failed: job 'migration in' failed in post-copy phase

      6. Fix proxy

      7. Try to recover postcopy migration, it failed unexpected:

      1. virsh migrate uefi qemu+unix:///system?socket=/tmp/sock --live --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3 --migrateuri unix:///tmp/33333-sock --postcopy-resume
        error: Requested operation is not valid: QEMU reports migration is still running

      8. Try to abort migration:

      1. virsh domjobabort uefi --postcopy
        error: internal error: unable to execute QEMU command 'migrate-pause': migrate-pause is currently only supported during postcopy-active state

      Actual results:
      postcopy recovery failed in step7

      Expected results:
      postcopy recovery can succeed in step7

      Additional info:
      Can't reproduce with tcp transport.

            zhexu@redhat.com Peter Xu
            rhn-support-fjin Fangge Jin
            Peter Xu Peter Xu
            Xiaohui Li Xiaohui Li
            Jiří Herrmann Jiří Herrmann
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: