Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-23372

[2152909] Unable to perform node to node migration

XMLWordPrintable

    • High
    • None

      Description of problem:

      We want to migrate VM from Node-to-Node, but it is failing with below error.

      Linux:
      VirtualMachineInstance migration uid 29acdc88-b8d7-4ab2-add2-1131a6d8868a failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get "write" lock')

      Windows:
      server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get \"consistent read\" lock')"

      > I am using Ceph(not RedHat Ceph) as my storage class with RWX mode.

      > now, if used node selector in my YAML, VM runs on the particular node without any error.

      > On initiating node to node migration from UI, it creates 1 more virt launcher pod in another node, but it goes on into completion rather than continue to stay in running. Then, migration fails with above error.

      > In the second virt-launcher pod, on doing virsh list, we don't find any VMs running.

      > Exisitng virt-launcher pod, we observe

      2022-12-13 13:14:22.884+0000: initiating migration
      2022-12-13T13:14:25.532764Z qemu-kvm: warning: Failed to unlock byte 201
      2022-12-13T13:14:25.532834Z qemu-kvm: warning: Failed to unlock byte 201

      Version-Release number of selected component (if applicable):

      4.10.6

      How reproducible:

      75%

      Actual results:

      Node-to-Node Live Migration should be completed without any error.

      Additional info:

      Attach virsh dump xml, libvirtd, virt-launcher logs

      >
      virsh # list
      Id Name State
      ------------------------------------------------------
      1 migration_migration-test-10-1-100-181 paused

      virsh # resume migration_migration-test-10-1-100-181
      error: Failed to resume domain 'migration_migration-test-10-1-100-181'
      error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)

              sgott@redhat.com Stuart Gott
              princesarvaiya Prince Sarvaiya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: