Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-48692

[4.17] Unable to restore a snapshot if the original DataVolume clone source is from a namespace/pvc that was deleted

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • CNV v4.17.1
    • CNV v4.17.0, CNV v4.16.1, CNV v4.15.5
    • CNV Storage
    • None
    • Storage Core Sprint 259, Storage Core Sprint 261
    • Urgent
    • None

      Description of problem:

      The following sequence of events results in failure:
      
      1. Have some template/golden image as PVC in namespace X
      2. Allow cloning cross namespace (see additional info)
      3. Create a new VM on namespace Y, using Clone PVC option from image from namespace X from step 1
      4. Snapshot this VM
      5. Delete namespace X
      6. Restore the snapshot of the VM
      
      The VirtualMachineRestore gets stuck, as it cannot create the DV anymore.

      Version-Release number of selected component (if applicable):

      4.16.1

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create an empty PVC named my-disk on a namespace called my-images$ cat disk.yaml 
      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: my-disk 
        namespace: my-images
      spec:
        accessModes:
          - ReadWriteOnce 
        resources:
          requests:
            storage: 10Gi 
        storageClassName: lvms-ssd
      
      $ oc get pvc -n my-images
      NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
      my-disk   Bound    pvc-839d1030-c5a7-4ee5-9dd3-4b2018cdcd1a   10Gi       RWO            lvms-ssd       <unset>                 57s
      
      2. Ensure you can clone from my-images to another namespace where VMs are created
      
      3. In the namespace where VMs are created (not my-images), create a new VM from template (I've used RHEL8, but not relevant)
       
       Disk Source: PVC (clone PVC)
       PVC Project: my-images
       PVC Name: my-disk The DV will look like this (I've created mine on homelab namespace)spec:
        preallocation: false
        source:
          pvc:
            name: my-disk
            namespace: my-images
        storage:
          resources:
            requests:
              storage: 10Gi
          storageClassName: lvms-ssd4. 
      
      Ensure the VM created fine
      
      5. In the Web Console, create a VM snapshot of the new VM
      
      $ oc get vmsnapshot
      NAME                         SOURCEKIND       SOURCENAME          PHASE       READYTOUSE   CREATIONTIME   ERROR
      snapshot-cyan-cockroach-53   VirtualMachine   rhel8-aqua-asp-20   Succeeded   true         4s        
      
      6. Now try to restore that snapshot, also in the Web Console
      
      7. All works
      
      8. Now delete the original my-images/my-disk (its not needed really, the VM is a clone of that)
      
      $ oc delete pvc -n my-images my-disk 
      persistentvolumeclaim "my-disk" deleted
      $ oc delete project my-images
      project.project.openshift.io "my-images" deleted
      
      9. Try to restore the snapshot again, it got stuck here:
      
      $ oc get virtualmachinerestore resotre-snapshot-cyan-cockroach-53-1724733464456 -o yaml
      apiVersion: snapshot.kubevirt.io/v1alpha1
      kind: VirtualMachineRestore
      metadata:
        creationTimestamp: "2024-08-27T04:37:45Z"
        generation: 5
        name: resotre-snapshot-cyan-cockroach-53-1724733464456
        namespace: homelab
        ownerReferences:
        - apiVersion: kubevirt.io/v1
          blockOwnerDeletion: false
          kind: VirtualMachine
          name: rhel8-aqua-asp-20
          uid: 7b75bc2b-e13a-455e-8d9a-5abceb3c957d
        resourceVersion: "37572385"
        uid: d0468b34-488e-4404-8623-906f28d0f7d0
      spec:
        target:
          apiGroup: kubevirt.io
          kind: VirtualMachine
          name: rhel8-aqua-asp-20
        virtualMachineSnapshotName: snapshot-cyan-cockroach-53
      status:
        complete: false
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2024-08-27T04:37:45Z"
          reason: 'admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
            namespace my-images does not exist'
          status: "False"
          type: Progressing
        - lastProbeTime: null
          lastTransitionTime: "2024-08-27T04:37:45Z"
          reason: 'admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
            namespace my-images does not exist'
          status: "False"
          type: Ready
        deletedDataVolumes:
        - restore-f5b3cb99-ed07-4597-b875-25fdbfbcd79b-disk-chocolate-pelican-74
        restores:
        - dataVolumeName: restore-d0468b34-488e-4404-8623-906f28d0f7d0-disk-chocolate-pelican-74
          persistentVolumeClaim: restore-d0468b34-488e-4404-8623-906f28d0f7d0-disk-chocolate-pelican-74
          volumeName: disk-chocolate-pelican-74
          volumeSnapshotName: vmsnapshot-4e5c7c55-ca8e-4a31-ae74-fc25dec54073-volume-disk-chocolate-pelican-74
      
      Now the user is unable to restore the VM from a backup as the original (now unrelated) namespace/pvc don't exist anymore 

      Actual results:

      Unable to restore snapshot of the VM using VirtualMachineRestore

      Expected results:

      Able to restore snapshot of VM

      Additional info:

      https://docs.openshift.com/container-platform/4.16/virt/storage/virt-enabling-user-permissions-to-clone-datavolumes.html
      
      The customer (in 4.15) got different error but in the exact same place, as if the source is the problem:
      
      Failed to create restore DataVolume: admission webhook "datavolume-validate.cdi.kubevirt.io" denied the request:  Data volume should have either Source or SourceRef, or be externally populated'
      
      But also failed at the same step of creating the restore DV. But they are on 4.15, this is in latest 4.16.1.
      
      This should work without a manual restore, as customers may need to urgently roll back their VMs.

       

            skagan@redhat.com Shelly Kagan
            rhn-support-gveitmic Germano Veit Michel
            Dalia Frank Dalia Frank
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: