Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-47106

Unable to start VM after stuck/failed VirtualMachineRestore

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • CNV v4.16.4
    • CNV v4.16.1
    • CNV Storage
    • Storage Core Sprint 259, CNV Storage 260, Storage Core Sprint 261, Storage Core Sprint 262
    • None

      Description of problem:

      If there is a problem during VirtualMachineRestore, like the one described in https://issues.redhat.com/browse/CNV-47105, it is very difficult for the user to get the VM back. 
      
      The system does not reconcile after deleting this object, the VM points to wrong disks and has a leftover .status.restoreInProgress

      Version-Release number of selected component (if applicable):

      4.16.1

      How reproducible:

      Always

      Steps to Reproduce:

      1. Get into a situation that a VirtualMachineRestore hangs and cannot progress. I'm using https://issues.redhat.com/browse/CNV-47105 as example
      
      $ oc get vmrestore resotre-snapshot-cyan-cockroach-53-1724733464456
      NAME                                               TARGETKIND       TARGETNAME          COMPLETE   RESTORETIME   ERROR
      resotre-snapshot-cyan-cockroach-53-1724733464456   VirtualMachine   rhel8-aqua-asp-20   false
      
      2. Delete the VirtualMachineRestore object
      
      Now we have 2 problems
      
      A) The VM was not changed back to use the volumes prior to the restore, its stuck pointing to new restore-xyz volumes that may or may not have been created.
      
      B) There is a leftover status that prevents VM start, requiring a manual patch
      
      $ oc get vm rhel8-aqua-asp-20 -o yaml | yq '.status.restoreInProgress'
      "resotre-snapshot-cyan-cockroach-53-1724733464456"
      
      $ virtctl start rhel8-aqua-asp-20
      Error starting VirtualMachine Internal error occurred: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: Cannot start VM until restore "resotre-snapshot-cyan-cockroach-53-1724733464456" completes
      
      The system should do better to recover and reconcile from such scenarios, without support intervention.
      

      Actual results:

      VMs unable to start

      Expected results:

      VMs able to abort restores and start on previous state

      Additional info:

       

            skagan@redhat.com Shelly Kagan
            rhn-support-gveitmic Germano Veit Michel
            Dalia Frank Dalia Frank
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: