Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-34724

Live Migration fails after volume hotplug


    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Blocker Blocker
    • CNV v4.14.1
    • CNV v4.14.0
    • CNV Storage
    • None
    • True
    • Hide

      block the OCP upgrade if a VM has a hotplug disk, live migration fails. 

      block the OCP upgrade if a VM has a hotplug disk, live migration fails. 
    • False
    • No
    • Need to add automation
    • Missing test

      Description of problem:

      Live migration is no longer possible after a VMI has a hotplug volume added and removed. 
      While this has nothing to do directly with hypershift/kubevirt, this issue was discovered while testing hypershift/kubevirt due to our usage of hotplug to provide volumes to the worker node VMs. Once we use kubevirt-csi to hotplug a volume and then we remove that volume, we noticed the VMIs failed to live migrate later on.
      It is trivial to reproduce this error outside of hypershift/kubevirt using a fedora vm.

      Version-Release number of selected component (if applicable):

      CNV 4.14

      How reproducible:


      Steps to Reproduce:

      See the live-migration-failure-script.sh script attached to this issue to reproduce this easily. Below are the general steps that script performs.
      This was reproduced using ODF 4.13 on OCP 4.14.0 with the latest CNV 4.14.0 pre-release
      1. Create a vm and start it
      2. live migrate the vmi to prove it is live migratable.
      3. add a hotplug volume (RWX) vmi
      4. remove a hotplug volume from the vmi
      5. live migration is now permanently broken for the vmi

      Actual results:

      Live migration fails after adding and removing a  RMX hotplug volume

      Expected results:

      Live migration should continue to work after hotplug

      Additional info:

      The qemu log on the source pod reports the following after the migration fails.
      #2023-10-30T19:53:14.880648Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
      #2023-10-30T19:53:15.053395Z qemu-kvm: Unable to read from socket: Bad file descriptor
      #2023-10-30T19:53:15.053477Z qemu-kvm: Unable to read from socket: Bad file descriptor
      #2023-10-30T19:53:15.053500Z qemu-kvm: Unable to read from socket: Bad file descriptor
      The libvirt logs on the src only indicate that the migration failed due to an expected error.
      {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"test-vm","namespace":"default","pos":"live-migration-source.go:718","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=7, Message='internal error: client socket is closed')","timestamp":"2023-10-31T20:03:26.521711Z","uid":"388fd212-9187-488b-9989-43d2f19368f1"}

            rhn-support-awels Alexander Wels
            rhn-engineering-dvossel David Vossel
            Jenia Peimer Jenia Peimer
            0 Vote for this issue
            11 Start watching this issue
