Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-50720

Fix migration race condition to prevent VM corruption

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • CNV v4.19.0
    • CNV v4.19.0
    • CNV Virt-Node
    • None
    • CNV Virt-Node Sprint 263, CNV Virt-Node Sprint 264, CNV Virt-Node Sprint 265, CNV Virt-Node Sprint 266, CNV Virt-Node Sprint 267, CNV Virt-Node Sprint 268
    • None

      Description of problem:

      When a VM is migrating, if any crash occurs between when libvirt concludes the migration and KubeVirt tracks the migration as completed, KubeVirt will consider the migration as crashed and delete all pods associated with the VM. This includes the pods for the happily running migration-destination pods. This can lead directly to data loss or corruption.
      
      

      Version-Release number of selected component (if applicable):

      
      

      How reproducible:

      This is extremely difficult to reproduce in practice due to the extremely short time window where the VM is vulnerable. Usually on the order of 400 milliseconds.
      

      Steps to Reproduce:

      1. We'll need to make the problematic window longer in order to be able to test this and the fix.
      2.
      3.
      

      Actual results:

      
      

      Expected results:

      
      

      Additional info:

      
      

              jelejosne Jed Lejosne
              sgott@redhat.com Stuart Gott
              Denys Shchedrivyi Denys Shchedrivyi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: