Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-34724

Live Migration fails after volume hotplug

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Blocker Blocker
    • CNV v4.14.1
    • CNV v4.14.0
    • Storage Platform
    • None
    • True
    • Hide

      block the OCP upgrade if a VM has a hotplug disk, live migration fails. 

      Show
      block the OCP upgrade if a VM has a hotplug disk, live migration fails. 
    • False
    • Need to add automation
    • Missing test
    • No

      Description of problem:

      
      Live migration is no longer possible after a VMI has a hotplug volume added and removed. 
      
      While this has nothing to do directly with hypershift/kubevirt, this issue was discovered while testing hypershift/kubevirt due to our usage of hotplug to provide volumes to the worker node VMs. Once we use kubevirt-csi to hotplug a volume and then we remove that volume, we noticed the VMIs failed to live migrate later on.
      
      It is trivial to reproduce this error outside of hypershift/kubevirt using a fedora vm.
      
      

      Version-Release number of selected component (if applicable):

      
      CNV 4.14
      
      

      How reproducible:

      
      100%
      
      

      Steps to Reproduce:

      
      See the live-migration-failure-script.sh script attached to this issue to reproduce this easily. Below are the general steps that script performs.
      
      This was reproduced using ODF 4.13 on OCP 4.14.0 with the latest CNV 4.14.0 pre-release
      
      1. Create a vm and start it
      2. live migrate the vmi to prove it is live migratable.
      3. add a hotplug volume (RWX) vmi
      4. remove a hotplug volume from the vmi
      5. live migration is now permanently broken for the vmi
      
      

      Actual results:

      
      Live migration fails after adding and removing a  RMX hotplug volume
      
      

      Expected results:

      
      Live migration should continue to work after hotplug
      
      

      Additional info:

      
      The qemu log on the source pod reports the following after the migration fails.
      
      {code:java}
      #2023-10-30T19:53:14.880648Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
      #2023-10-30T19:53:15.053395Z qemu-kvm: Unable to read from socket: Bad file descriptor
      #2023-10-30T19:53:15.053477Z qemu-kvm: Unable to read from socket: Bad file descriptor
      #2023-10-30T19:53:15.053500Z qemu-kvm: Unable to read from socket: Bad file descriptor
      
      The libvirt logs on the src only indicate that the migration failed due to an expected error.
      
      {code:java}
      {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"test-vm","namespace":"default","pos":"live-migration-source.go:718","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=7, Message='internal error: client socket is closed')","timestamp":"2023-10-31T20:03:26.521711Z","uid":"388fd212-9187-488b-9989-43d2f19368f1"}
      

              rhn-support-awels Alexander Wels
              rhn-engineering-dvossel David Vossel
              Jenia Peimer Jenia Peimer
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: