Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7115

Recovering postcopy before network issue is resolved leads to wrong qemu migration status

    • qemu-kvm-8.2.0-1.el9
    • None
    • Moderate
    • CustomerScenariosInitiative
    • rhel-sst-virtualization
    • ssg_virtualization
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Bug Fix
    • Hide
      .Resuming a postcopy VM migration now works correctly.

      Previously, when performing a postcopy migration of a virtual machine (VM), if a proxy network failure occured during the RECOVER phase of the migration, the VM became unresponsive and the migration could not be resumed. Instead, the recovery command displayed the following error:

      ----
      error: Requested operation is not valid: QEMU reports migration is still running
      ----

      With this update, this problem has been fixed and poscopy migrations now resume correctly in the described circumstances.
      Show
      .Resuming a postcopy VM migration now works correctly. Previously, when performing a postcopy migration of a virtual machine (VM), if a proxy network failure occured during the RECOVER phase of the migration, the VM became unresponsive and the migration could not be resumed. Instead, the recovery command displayed the following error: ---- error: Requested operation is not valid: QEMU reports migration is still running ---- With this update, this problem has been fixed and poscopy migrations now resume correctly in the described circumstances.
    • Done
    • None

      Description of problem:
      Do postcopy migration with unix+proxy transport, break proxy, postcopy migration failed. Try to recover migration before proxy is fixed, it failed as expected. Then fix the proxy, and try to recover migration again, it still failed ans said:
      "error: Requested operation is not valid: QEMU reports migration is still running"

      Version-Release number of selected component (if applicable):
      libvirt-8.5.0-2.el9.x86_64
      qemu-kvm-7.0.0-9.el9.x86_64

      How reproducible:
      100%

      Steps to Reproduce:
      1.Start a vm

      2. Set up proxy between src and dest host
      1) On dest host:

      1. socat tcp-listen:22222,reuseaddr,fork unix:/var/run/libvirt/virtqemud-sock
      2. socat tcp-listen:33333,reuseaddr,fork unix:/tmp/33333-sock
        2) On src host:
      3. socat unix-listen:/tmp/sock,reuseaddr,fork tcp:<dest_host>:22222
      4. socat unix-listen:/tmp/33333-sock,reuseaddr,fork tcp:<dest_host>:33333

      2.Migrate vm to other host with unix transport:

      1. virsh migrate uefi qemu+unix:///system?socket=/tmp/sock --live --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3 --migrateuri unix:///tmp/33333-sock

      3.Switch migration to postcopy

      1. virsh migrate-postcopy uefi

      4. Break this proxy, migration will fail immediately:

      1. socat tcp-listen:33333,reuseaddr,fork unix:/tmp/33333-sock

      5. Try to recover postcopy migration, it failed as expected:

      1. virsh migrate uefi qemu+unix:///system?socket=/tmp/sock --live --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3 --migrateuri unix:///tmp/33333-sock --postcopy-resume
        error: operation failed: job 'migration in' failed in post-copy phase

      6. Fix proxy

      7. Try to recover postcopy migration, it failed unexpected:

      1. virsh migrate uefi qemu+unix:///system?socket=/tmp/sock --live --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3 --migrateuri unix:///tmp/33333-sock --postcopy-resume
        error: Requested operation is not valid: QEMU reports migration is still running

      8. Try to abort migration:

      1. virsh domjobabort uefi --postcopy
        error: internal error: unable to execute QEMU command 'migrate-pause': migrate-pause is currently only supported during postcopy-active state

      Actual results:
      postcopy recovery failed in step7

      Expected results:
      postcopy recovery can succeed in step7

      Additional info:
      Can't reproduce with tcp transport.

              zhexu@redhat.com Peter Xu
              rhn-support-fjin Fangge Jin
              Peter Xu Peter Xu
              Xiaohui Li Xiaohui Li
              Jiří Herrmann Jiří Herrmann
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: