Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-25170

[2168188] [4.12]VM with datasource with non-existing pvc wont start after datasource fix

XMLWordPrintable

    • High
    • None

      +++ This bug was initially created as a clone of Bug #2156517 +++

      Description of problem:
      Changing a dataSource PVC reference (spec.source.pvc.name) to a correct one after wrong name was given, will leave the VM using that dataSource in the same "Stopped" status.

      Version-Release number of selected component (if applicable):
      4.12

      How reproducible:
      100%

      Steps to Reproduce:
      1. Create a dataSource with a wrong spec.source.pvc.name
      2. Create a VM using the dataSource
      3. Edit the dataSource to an existing PVC name

      Actual results:
      The VM is left on Stopped and no DV is being cloned

      Expected results:
      The VM will be on running, and a new DV will be created

      Additional info:
      1. After fixing the PVC name stopping/restarting/starting the VM won't help. only deleting and creating the VM will make it run.

      2. When a wrong PVC name is given, creating the VM will create a DV but it's status is "Unknown" with an Error of "CloneWithoutSource", my guess is, even after changing the naming, the dataSource is still looking at the old generation (version) of the dataSource and only recreating it will fix it.

      — Additional comment from Roni Kishner on 2022-12-27 08:54:59 UTC —

      Can verify on cnv-tests->test_missing_golden_image_pvc

      — Additional comment from Lee Yarwood on 2023-01-04 10:48:18 UTC —

      Why is this bug against `Infrastructure` and not `Virtualization` or `Storage`?

      (In reply to Roni Kishner from comment #0)
      >
      > Steps to Reproduce:
      > 1. Create a dataSource with a wrong spec.source.pvc.name
      > 2. Create a VM using the dataSource

      I assume you're using DataVolumeTemplates here? Could you provide a complete example definition?

      > 3. Edit the dataSource to an existing PVC name
      >
      > [..]
      >
      > Additional info:
      > 1. After fixing the PVC name stopping/restarting/starting the VM won't help.
      > only deleting and creating the VM will make it run.
      >
      > 2. When a wrong PVC name is given, creating the VM will create a DV but it's
      > status is "Unknown" with an Error of "CloneWithoutSource", my guess is, even
      > after changing the naming, the dataSource is still looking at the old
      > generation (version) of the dataSource and only recreating it will fix it.

      I'm pretty sure this is expected behaviour, the DataVolumeTemplate -> DataSource -> DataVolume -> PVC creation flow is one shot. Any modifications to the DataSource referred to be the DataVolumeTemplate then requiring a rebuild to correct. Otherwise each update to a DataSource from a DataImportCron that for example tracks `rhel9` would cause a rebuild of all VM PVCs referring to the DataSource. Happy to be corrected here but I can't see an issue with this behaviour.

      — Additional comment from Roni Kishner on 2023-01-04 11:27:44 UTC —

      (In reply to Lee Yarwood from comment #2)
      > Why is this bug against `Infrastructure` and not `Virtualization` or
      > `Storage`?
      >
      Changed to 'Storage'.

      >
      > I assume you're using DataVolumeTemplates here? Could you provide a complete
      > example definition?
      >
      Will attach definitions.
      >
      > I'm pretty sure this is expected behaviour, the DataVolumeTemplate ->
      > DataSource -> DataVolume -> PVC creation flow is one shot. Any modifications
      > to the DataSource referred to be the DataVolumeTemplate then requiring a
      > rebuild to correct. Otherwise each update to a DataSource from a
      > DataImportCron that for example tracks `rhel9` would cause a rebuild of all
      > VM PVCs referring to the DataSource. Happy to be corrected here but I can't
      > see an issue with this behaviour.

      Up until now if a VM creation failed with wrong reference in the DataSource, editing it to a working reference would fix the issue and the VM would start.
      So i'm assuming something either changed in the behaviour, or it's a bug created from another change (maybe of the garbage collector?)

      Also I think it makes sense to re-try to create a DV/PVC for a VM if it failed before and the DataSource changed.

      — Additional comment from Roni Kishner on 2023-01-04 11:56:47 UTC —

      — Additional comment from Roni Kishner on 2023-01-04 11:57:10 UTC —

      — Additional comment from Lee Yarwood on 2023-01-04 14:30:23 UTC —

      (In reply to Roni Kishner from comment #3)
      > (In reply to Lee Yarwood from comment #2)
      > >
      > > I'm pretty sure this is expected behaviour, the DataVolumeTemplate ->
      > > DataSource -> DataVolume -> PVC creation flow is one shot. Any modifications
      > > to the DataSource referred to be the DataVolumeTemplate then requiring a
      > > rebuild to correct. Otherwise each update to a DataSource from a
      > > DataImportCron that for example tracks `rhel9` would cause a rebuild of all
      > > VM PVCs referring to the DataSource. Happy to be corrected here but I can't
      > > see an issue with this behaviour.
      >
      > Up until now if a VM creation failed with wrong reference in the DataSource,
      > editing it to a working reference would fix the issue and the VM would start.
      > So i'm assuming something either changed in the behaviour, or it's a bug
      > created from another change (maybe of the garbage collector?)

      Yeah I wonder if the DV being GC'd means a PVC is created but never populated? You could prove that by disabling DV GC FWIW:

      https://github.com/kubevirt/containerized-data-importer/blob/main/doc/datavolumes.md#garbage-collection-of-successfully-completed-datavolumes

      Does deleting the created PVC also cause the DV to be recreated by KubeVirt's virt-controller?

      > Also I think it makes sense to re-try to create a DV/PVC for a VM if it
      > failed before and the DataSource changed.

      Triggered by the virt-controller in KubeVirt? I'm not sure that's entirely valid as it would mean we'd need to watch DataSources associated with failed VirtualMachines.

      — Additional comment from Roni Kishner on 2023-01-04 18:39:38 UTC —

      (In reply to Lee Yarwood from comment #6)

      > Yeah I wonder if the DV being GC'd means a PVC is created but never
      > populated? You could prove that by disabling DV GC FWIW:
      >
      > https://github.com/kubevirt/containerized-data-importer/blob/main/doc/
      > datavolumes.md#garbage-collection-of-successfully-completed-datavolumes

      I tried both with dataVolumeTTLSeconds=-1 and without the same result of the DV still existing happened, which makes me think it's not working properly, need to try with: cdi.kubevirt.io/storage.deleteAfterCompletion: "false" as well.

      > Does deleting the created PVC also cause the DV to be recreated by
      > KubeVirt's virt-controller?

      Yes, the DV and PVC are linked, so deleting 1 of them causes the other to be deleted, doesn't matter which if I delete the DV or the PVC

      — Additional comment from Yan Du on 2023-02-01 13:19:17 UTC —

      Arnon, do we have progress for the bug?

      — Additional comment from Arnon Gilboa on 2023-02-01 13:47:55 UTC —

      Yes Yan, I'm on it.

      — Additional comment from Roni Kishner on 2023-02-07 18:29:53 UTC —

      When the fix is verified we need to back port it to 4.12.1.

              agilboa@redhat.com Arnon Gilboa
              yadu1@redhat.com Yan Du
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: