Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-31467

[2227066] Recreation of the boot source images as cached snapshots may have issues

XMLWordPrintable

    • High
    • No

      Description of problem:
      If your default storage class was not supporting snapshots,
      boot source images, created by the DataImportCron in openshift-virtualization-os-images namespace, will be imported as the DVs/PVCs.

      When you switch the default storage class to OCS, you can re-import the images by deleting the old DVs. The DV/PVC will be re-imported, VolumeSnapshot object will be created, and DV/PVC will be removed automatically.

      Alex akalenyu@redhat.com looked at it, and sees 2 issues:

      Issue 1: Snapshots are being made out of the previous storage class (when changing SC from HPP->OCS)

      Issue 2: When deleting the old storage class DVs, there may be a race where the snapshot got created, but the DV didn't recreate 

      Version-Release number of selected component (if applicable):
      4.14

      How reproducible:
      Always

      Steps to Reproduce:

      1. Have a non-snapshotable default storage class (HPP)

      2. See that DVs/PVCs were imported

      $ oc get dv -A
      NAMESPACE                            NAME                          PHASE       PROGRESS   RESTARTS   AGE
      openshift-virtualization-os-images   centos-stream8-b9b768dcd73b   Succeeded   100.0%                18h
      openshift-virtualization-os-images   centos-stream9-362e1f1d9f11   Succeeded   100.0%                18h
      openshift-virtualization-os-images   centos7-680e9b4e0fba          Succeeded   100.0%                18h
      openshift-virtualization-os-images   fedora-f7cc15256f08           Succeeded   100.0%                18h
      openshift-virtualization-os-images   rhel8-0da894200daa            Succeeded   100.0%                18h
      openshift-virtualization-os-images   rhel9-b006ef7856b6            Succeeded   100.0%                18h

      3. Make HPP non-default, make OCS default

         oc patch storageclass ocs-storagecluster-ceph-rbd -p '{"metadata": {"annotations":

      {"storageclass.kubernetes.io/is-default-class": "true"}

      }}'

      4. Delete one DV 

      $ oc delete dv -n openshift-virtualization-os-images rhel9-b006ef7856b6
      datavolume.cdi.kubevirt.io "rhel9-b006ef7856b6" deleted

      5. DV didn't get recreated (but should have been), VolumeSnapshot was created, but it's not Ready

      $ oc get VolumeSnapshot -A
      NAMESPACE                            NAME                 READYTOUSE   SOURCEPVC            SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                            SNAPSHOTCONTENT   CREATIONTIME   AGE
      openshift-virtualization-os-images   rhel9-b006ef7856b6   false        rhel9-b006ef7856b6                                         ocs-storagecluster-rbdplugin-snapclass                                    13s

      [cloud-user@ocp-psi-executor ~]$ oc get VolumeSnapshot -n openshift-virtualization-os-images rhel9-b006ef7856b6 -oyaml
      apiVersion: snapshot.storage.k8s.io/v1
      kind: VolumeSnapshot
      metadata:
        annotations:
          cdi.kubevirt.io/storage.import.lastUseTime: "2023-07-27T14:31:32.631870881Z"
        creationTimestamp: "2023-07-27T14:31:32Z"
        finalizers:
        - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
        generation: 1
        labels:
          app: containerized-data-importer
          app.kubernetes.io/component: storage
          app.kubernetes.io/managed-by: cdi-controller
          app.kubernetes.io/part-of: hyperconverged-cluster
          app.kubernetes.io/version: 4.14.0
          cdi.kubevirt.io: ""
          cdi.kubevirt.io/dataImportCron: rhel9-image-cron
        name: rhel9-b006ef7856b6
        namespace: openshift-virtualization-os-images
        resourceVersion: "1182048"
        uid: d69181d0-4195-4b3f-91b4-ba3631f05249
      spec:
        source:
          persistentVolumeClaimName: rhel9-b006ef7856b6
        volumeSnapshotClassName: ocs-storagecluster-rbdplugin-snapclass
      status:
        error:
          message: 'Failed to create snapshot content with error snapshot controller failed
            to update rhel9-b006ef7856b6 on API server: cannot get claim from snapshot'

      6. See that 2 minutes later, other VolumeSnapshots are created while old DVs were not yet deleted

      $ oc get VolumeSnapshot -A
      NAMESPACE                            NAME                          READYTOUSE   SOURCEPVC                     SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                            SNAPSHOTCONTENT                                    CREATIONTIME   AGE
      openshift-virtualization-os-images   centos-stream8-b9b768dcd73b   false        centos-stream8-b9b768dcd73b                                         ocs-storagecluster-rbdplugin-snapclass   snapcontent-8455f2ea-0d70-4998-9fa5-bbc42133b1f5                  23s
      openshift-virtualization-os-images   centos-stream9-362e1f1d9f11   false        centos-stream9-362e1f1d9f11                                         ocs-storagecluster-rbdplugin-snapclass   snapcontent-3eec6ff1-f73f-493f-b61b-58abfeec5b65                  23s
      openshift-virtualization-os-images   centos7-680e9b4e0fba          false        centos7-680e9b4e0fba                                                ocs-storagecluster-rbdplugin-snapclass   snapcontent-76229453-37ff-40f6-8ce0-94e15a5b912c                  23s
      openshift-virtualization-os-images   fedora-f7cc15256f08           false        fedora-f7cc15256f08                                                 ocs-storagecluster-rbdplugin-snapclass   snapcontent-94d05d80-20f5-4861-a7af-344f19842a61                  23s
      openshift-virtualization-os-images   rhel8-0da894200daa            false        rhel8-0da894200daa                                                  ocs-storagecluster-rbdplugin-snapclass   snapcontent-df7f9a06-4a2e-41b1-8f04-a16758daf4e8                  23s
      openshift-virtualization-os-images   rhel9-b006ef7856b6            false        rhel9-b006ef7856b6                                                  ocs-storagecluster-rbdplugin-snapclass                                                                     2m47s

      7. See the yaml of another VolumeSnapshot, whose DV/PVC wasn't deleted and still using non-snapshotable HPP:

      spec:
        source:
          persistentVolumeClaimName: centos-stream8-b9b768dcd73b
        volumeSnapshotClassName: ocs-storagecluster-rbdplugin-snapclass
      status:
        boundVolumeSnapshotContentName: snapcontent-8455f2ea-0d70-4998-9fa5-bbc42133b1f5
        error:
          message: 'Failed to check and update snapshot content: failed to take snapshot
            of the volume pvc-e59ee8cd-57d0-4ecf-906f-0ab7a1f8ba72: "rpc error: code = Internal
            desc = panic runtime error: invalid memory address or nil pointer dereference"'
          time: "2023-07-27T14:33:56Z"
        readyToUse: false

      8. To fix the broken VolumeSnapshot of the first deleted DV: delete that VolumeSnapshot

      $ oc delete VolumeSnapshot -n openshift-virtualization-os-images rhel9-b006ef7856b6
      volumesnapshot.snapshot.storage.k8s.io "rhel9-b006ef7856b6" deleted

      9. This will trigger the DV/PVC to re-import on OCS, create a VolumeSnapshot that will be ReadyToUse, and DV/PVC will be deleted automatically. 

      Actual results:
      Re-importing requires more steps.

      Expected results:
      Re-importing should happen once we switch the storage class and delete the old DVs.

              akalenyu Alex Kalenyuk
              jpeimer@redhat.com Jenia Peimer
              Harel Meir Harel Meir
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: