Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-26063

[2172612] [4.13] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

XMLWordPrintable

    • Storage Core Sprint 233
    • High
    • None

      +++ This bug was initially created as a clone of Bug #2149654 +++

      Description of problem:
      VMRestore doesn't get to the Complete state,
      restore DV stays WaitForFirstConsumer,
      restore PVC is Pending
      restore VM is Stopped and not Ready

      Version-Release number of selected component (if applicable):
      4.12

      How reproducible:
      Always on SNO cluster with snapshot capable storage with WaitForFirstConsumer volumeBindingMode (TopoLVM storage in our case - lvms-vg1)

      Steps to Reproduce:
      1. Create a VM - VM is Running
      2. Create a VMSnapshot - VMSnapshot is ReadyToUse
      3. Create a VMRestore

      Actual results:
      VMRestore is not Complete

      $ oc get vmrestore
      NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
      restore-my-vm   VirtualMachine   vm-restored   false  

      Expected results:
      VMRestore is Complete (PVC Bound, DV Succeded and garbage collected)

      Workaround and ONE MORE ISSUE:
      1. Start the restored VM
      2. See the VM is Ready and Running, DV succeeded, PVC Bound
      3. See the VMRestore is still not Complete:

      $ oc get vmrestore
      NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
      restore-my-vm   VirtualMachine   vm-restored   false  

      $ oc describe vmrestore restore-my-vm | grep Events -A 10
      Events:
        Type     Reason                      Age                    From                Message
        ----     ------                      ----                   ----
        Warning  VirtualMachineRestoreError  4m4s (x23 over 4m21s)  restore-controller  VirtualMachineRestore encountered error invalid RunStrategy "Always"

      4. See the restored VM runStrategy:
      $ oc get vm vm-restored -oyaml | grep running
          running: true

      ***
      PLEASE NOTE that the restored VM on OCS with Immediate volumeBindingMode on the multi-node cluster gets the "running: false", despite that the source VM had it "true", and we are not getting the above error, and VMRestore becomes Complete:
      $ oc get vm vm-restored-ocs -oyaml | grep running
        running: false
      ***

      5. Stop the restored VM
      6. See the VMRestore is Complete:
      $ oc get vmrestore
      NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
      restore-my-vm   VirtualMachine   vm-restored   true       1s            

      Additional info:

      VM yaml: 

      $ cat vm.yaml
      apiVersion: kubevirt.io/v1alpha3
      kind: VirtualMachine
      metadata:
        name: vm-cirros-source
        labels:
          kubevirt.io/vm: vm-cirros-source
      spec:
        dataVolumeTemplates:
        - metadata:
            name: cirros-dv-source
          spec:
            storage:
              resources:
                requests:
                  storage: 1Gi
              storageClassName: odf-lvm-vg1
            source:
              http:
                url: <cirros-0.4.0-x86_64-disk.qcow2>
        running: true
        template:
          metadata:
            labels:
              kubevirt.io/vm: vm-cirros-source
          spec:
            domain:
              devices:
                disks:
                - disk:
                    bus: virtio
                  name: datavolumev
              machine:
                type: ""
              resources:
                requests:
                  memory: 100M
            terminationGracePeriodSeconds: 0
            volumes:
            - dataVolume:
                name: cirros-dv-source
              name: datavolumev

      VMSnapshot yaml:

      $ cat snap.yaml
      apiVersion: snapshot.kubevirt.io/v1alpha1
      kind: VirtualMachineSnapshot
      metadata:
        name: my-vmsnapshot
      spec:
        source:
          apiGroup: kubevirt.io
          kind: VirtualMachine
          name: vm-cirros-source

      VMRestore yaml:

      $ cat vmrestore.yaml
      apiVersion: snapshot.kubevirt.io/v1alpha1
      kind: VirtualMachineRestore
      metadata:
        name: restore-my-vm
      spec:
        target:
          apiGroup: kubevirt.io
          kind: VirtualMachine
          name: vm-restored
        virtualMachineSnapshotName: my-vmsnapshot

      — Additional comment from Jenia Peimer on 2023-02-19 13:25:23 UTC —

      Just to keep the info in this BZ: this bug was discussed at KubeVirt SIG-Storage Meeting, and the current approach to fix it is to mark VMRestore Complete when DV is WFFC and PVC is Pending.

              skagan@redhat.com Shelly Kagan
              jpeimer@redhat.com Jenia Peimer
              Jenia Peimer Jenia Peimer
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: