Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-34761

[2247593] Live Migration fails after volume hotplug

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • VERIFIED
    • Live migration cannot be enabled for a virtual machine instance (VMI) after a hotplug volume has been added and removed. (BZ#2247593)
    • Known Issue
    • Done
    • ---
    • ---
    • Storage Core Sprint 244, Storage Core Sprint 245, Storage Core Sprint 246
    • Urgent
    • No

      Original issue in Jira: https://issues.redhat.com/browse/CNV-34724. Moving it to bugzilla because CNV 4.14.0 we are still using bugzilla to report bugs. We will use Jira for all components starting CNV 4.15.

      Description of problem:

      Live migration is no longer possible after a VMI has a hotplug volume added and removed.

      While this has nothing to do directly with hypershift/kubevirt, this issue was discovered while testing hypershift/kubevirt due to our usage of hotplug to provide volumes to the worker node VMs. Once we use kubevirt-csi to hotplug a volume and then we remove that volume, we noticed the VMIs failed to live migrate later on.

      It is trivial to reproduce this error outside of hypershift/kubevirt using a fedora vm.

      Version-Release number of selected component (if applicable):

      CNV 4.14

      How reproducible:

      100%

      Steps to Reproduce:

      See the live-migration-failure-script.sh script attached to this issue to reproduce this easily. Below are the general steps that script performs.

      This was reproduced using ODF 4.13 on OCP 4.14.0 with the latest CNV 4.14.0 pre-release

      1. Create a vm and start it
      2. live migrate the vmi to prove it is live migratable.
      3. add a hotplug volume (RWX) vmi
      4. remove a hotplug volume from the vmi
      5. live migration is now permanently broken for the vmi

      Actual results:

      Live migration fails after adding and removing a RMX hotplug volume

      Expected results:

      Live migration should continue to work after hotplug

      Additional info:

      The qemu log on the source pod reports the following after the migration fails.

      
      

      #2023-10-30T19:53:14.880648Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
      #2023-10-30T19:53:15.053395Z qemu-kvm: Unable to read from socket: Bad file descriptor
      #2023-10-30T19:53:15.053477Z qemu-kvm: Unable to read from socket: Bad file descriptor
      #2023-10-30T19:53:15.053500Z qemu-kvm: Unable to read from socket: Bad file descriptor

      The libvirt logs on the src only indicate that the migration failed due to an expected error.

      
      
      {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"test-vm","namespace":"default","pos":"live-migration-source.go:718","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=7, Message='internal error: client socket is closed')","timestamp":"2023-10-31T20:03:26.521711Z","uid":"388fd212-9187-488b-9989-43d2f19368f1"}

            akalenyu Alex Kalenyuk
            ycui@redhat.com Ying Cui
            Jenia Peimer Jenia Peimer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: