Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-48245

Incorrectly editing DataVolumeTemplates of a VM can break the entire system

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • CNV v4.17.1
    • CNV v4.15.5
    • CNV Storage
    • None
    • CNV Storage 260
    • High
    • None

      Description of problem:

      
      This is related to the situation in CNV-47105, building on top of it.
      A potential incorrect edit of a VM yaml in the DataVolumeTemplates section of the VM can break the entire system.
      
      

      Version-Release number of selected component (if applicable):

      CNV 4.15.5
      

      How reproducible:

      100%
      

      Steps to Reproduce:

      1. Create a VM
      2. In the VM spec, find the DataVolumeTemplate section, it will look like this:
      
      spec:
        dataVolumeTemplates:
          - apiVersion: cdi.kubevirt.io/v1beta1
            kind: DataVolume
            metadata:
              creationTimestamp: null
              name: rhel8
            spec:
              source:
                pvc:
                  name: golden-template
                  namespace: my-templates
              storage:
                resources:
                  requests:
                    storage: '32212254720'
      
      3. Delete the "source" part of the dataVolume above, so it looks like this
      
        dataVolumeTemplates:
          - apiVersion: cdi.kubevirt.io/v1beta1
            kind: DataVolume
            metadata:
              creationTimestamp: null
              name: rhel8
            spec:
              storage:
                resources:
                  requests:
                    storage: '32212254720'
      
      NOTE: the DV above is now invalid, it doesn't pass DV webhook validation, but unfortunately it passed VM validation. The system has this invalid DV accepted.
      
      4. Snapshot the VM
      
      5. Note that DVT with missing source is propagated to the VirtualMachineSnapshotContent 
      
      6. Try to restore that snapshot
      
      

      Actual results:

      - The virt-controller is in a crash loop, nothing else works properly
      
      goroutine 1175 [running]:
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0020fbac0?})
        /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
      panic({0x1d98700, 0x34b5d50})
        /usr/lib/golang/src/runtime/panic.go:884 +0x213
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*vmRestoreTarget).createDataVolume(_, {{{0xc0031e93b0, 0xa}, {0xc0031e4468, 0x17}}, {{0xc0031cb880, 0x35}, {0x0, 0x0}, {0x0, ...}, ...}, ...})
        /remote-source/app/pkg/storage/snapshot/restore.go:734 +0xd6
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*vmRestoreTarget).reconcileDataVolumes(0xc0034df668)
        /remote-source/app/pkg/storage/snapshot/restore.go:607 +0x20a
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*vmRestoreTarget).Reconcile(0xc0034df668?)
        /remote-source/app/pkg/storage/snapshot/restore.go:407 +0x3d
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*VMRestoreController).updateVMRestore(0xc002752fc0, 0xc0033fc6e0)
        /remote-source/app/pkg/storage/snapshot/restore.go:167 +0x910
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*VMRestoreController).processVMRestoreWorkItem.func1({0xc0030d6480, 0x2b})
        /remote-source/app/pkg/storage/snapshot/restore_base.go:165 +0x245
      kubevirt.io/kubevirt/pkg/virt-controller/watch/util.ProcessWorkItem.func1({0x254eae8?, 0xc00152cd00}, 0x41c490?, {0x1cb1260?, 0xc0020fbac0})
        /remote-source/app/pkg/virt-controller/watch/util/util.go:55 +0x1ae
      kubevirt.io/kubevirt/pkg/virt-controller/watch/util.ProcessWorkItem({0x254eae8, 0xc00152cd00}, 0x0?)
        /remote-source/app/pkg/virt-controller/watch/util/util.go:69 +0x4e
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*VMRestoreController).processVMRestoreWorkItem(0x0?)
        /remote-source/app/pkg/storage/snapshot/restore_base.go:152 +0x46
      kubevirt.io/kubevirt/pkg/storage/snapshot.(*VMRestoreController).vmRestoreWorker(...)
        /remote-source/app/pkg/storage/snapshot/restore_base.go:147
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
        /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x2526d80, 0xc003af7050}, 0x1, 0xc00294b7a0)
        /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
        /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
      k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
        /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
      created by kubevirt.io/kubevirt/pkg/storage/snapshot.(*VMRestoreController).Run
        /remote-source/app/pkg/storage/snapshot/restore_base.go:138 +0x7b4
      
      That DV does not pass validation
      
      {"component":"virt-controller","level":"info","msg":"re-enqueuing VirtualMachine my-templates/rhel8","pos":"vm.go:281","reason":"Error encountered while creating DataVolumes: failed to create DataVolume: admission webhook \"datavolume-validate.cdi.kubevirt.io\" denied the request:  Data volume should have either Source or SourceRef, or be externally populated","timestamp":"2024-09-10T04:08:29.668009Z"}
      
      Deleting the restore object makes the system recover, but the VM is messed up.
      

      Expected results:

      - Have DataVolumeTemplate Validation when updating the VM spec
      - Not crashing due to incorrect edits
      

      Additional info:

      
      I'm unsure if that incorrect DVT can only be a consequence of an user initiated edit, maybe there is a flow that causes it. It was seen on the field.
      
      

            skagan@redhat.com Shelly Kagan
            rhn-support-gveitmic Germano Veit Michel
            Dalia Frank Dalia Frank
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: