Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-39213

CDI volume cloning expands sparse images which results in wasted storage space

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • CNV v4.17.3
    • CNV v4.14.3
    • Storage Platform
    • None
    • 13
    • False
    • Hide

      None

      Show
      None
    • False
    • CNV v4.17.0.rhel9-498
    • ---
    • ---
    • Storage Core Sprint 254, Storage Core Sprint 256, Storage Core Sprint 257, Storage Core Sprint 258, Storage Core Sprint 259, CNV Storage 260, Storage Core Sprint 262, Storage Core Sprint 263, CNV Storage 264
    • No

      The host-assisted volume cloning doesn't seem to handle sparse images correctly. The resulting image is filled with zeros which wastes storage space. The following are the steps to reproduce the issue:

      Create a volume using ocs-storagecluster-ceph-rbd-virtualization storageclass in Filesystem mode:

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: myvol
        namespace: kubevirt-example
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: ocs-storagecluster-ceph-rbd-virtualization
        volumeMode: Filesystem

      Mount the volume to a pod and create a sparse disk.img file on the volume with 1GiB of data and 5 GiB of virtual size:

      $ dd if=/dev/random of=disk.img bs=1024M count=1
      $ truncate -s 5G disk.img

      Check that the file is a sparse file:

      $ ls -ls --block-size=M disk.img
      1025M -rw-r--r--. 1 root root 5120M Mar  9 13:59 disk.img

      Check the PVC disk usage in Ceph:

      $ rbd du ocs-storagecluster-cephblockpool/csi-vol-bb781ad4-6ce4-4a66-8645-878c7d56f2c4
      NAME                                          PROVISIONED  USED
      csi-vol-bb781ad4-6ce4-4a66-8645-878c7d56f2c4       50 GiB  1.2 GiB

      Next, clone the volume to a Ceph RBD volume in Block mode by applying this DataVolume resource:

      apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        name: myvol-converted
      spec:
        source:
          pvc:
            name: myvol
            namespace: kubevirt-example
        storage:
          accessModes:
          - ReadWriteMany
          storageClassName: ocs-storagecluster-ceph-rbd-virtualization
          volumeMode: Block

      Follow the logs of the pods that perform the volume conversion:

      $ stern .
      + cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 › cdi-upload-servercdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:03:15.326876       1 uploadserver.go:74] Running server on 0.0.0.0:8443- toolbox-container-6f56f456dd-fntvb › toolbox-containercdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:03:56.552251       1 uploadserver.go:389] Content type header is "filesystem-clone"cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:03:56.552488       1 uploadserver.go:493] Untaring 5368709120 bytes to /dev/cdi-block-volume+ 9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod › cdi-clone-source9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source VOLUME_MODE=filesystem9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source MOUNT_POINT=/var/run/cdi/clone/source9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source /var/run/cdi/clone/source /9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source UPLOAD_BYTES=53687296009644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:56.238237      10 clone-source.go:220] content-type is "filesystem-clone"9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:56.238291      10 clone-source.go:221] mount is "/var/run/cdi/clone/source"9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:56.238295      10 clone-source.go:222] upload-bytes is 53687296009644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:56.238310      10 clone-source.go:239] Starting cloner target9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:56.238359      10 clone-source.go:177] Executing [/usr/bin/tar cv -S disk.img]9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:56.539802      10 clone-source.go:251] Set header to filesystem-clone9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:57.238841      10 prometheus.go:75] 1.149644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:58.238876      10 prometheus.go:75] 2.659644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:03:59.239334      10 prometheus.go:75] 4.289644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:00.239646      10 prometheus.go:75] 4.469644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:01.240023      10 prometheus.go:75] 4.509644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:02.241024      10 prometheus.go:75] 5.039644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:03.241686      10 prometheus.go:75] 6.739644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:04.242148      10 prometheus.go:75] 8.029644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:05.243077      10 prometheus.go:75] 9.139644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:06.243825      10 prometheus.go:75] 10.599644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:07.243902      10 prometheus.go:75] 11.649644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:08.244745      10 prometheus.go:75] 12.299644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:09.244822      10 prometheus.go:75] 13.719644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:10.244881      10 prometheus.go:75] 15.059644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:11.245753      10 prometheus.go:75] 16.169644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:12.246252      10 prometheus.go:75] 17.719644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:13.247379      10 prometheus.go:75] 18.819644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:14.247841      10 prometheus.go:75] 19.929644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:14.292635      10 clone-source.go:127] Wrote 1073745920 bytes9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:15.248266      10 prometheus.go:75] 100.00cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:04:46.778822       1 uploadserver.go:502] Written 5368709120cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:04:46.778866       1 uploadserver.go:416] Wrote data to /dev/cdi-block-volume9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:46.779122      10 clone-source.go:269] Response body:cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:04:46.778949       1 uploadserver.go:203] Shutting down http server after successful upload9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source I0309 14:04:46.779151      10 clone-source.go:271] clone complete9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod cdi-clone-source /cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 cdi-upload-server I0309 14:04:46.779311       1 uploadserver.go:103] UploadServer successfully exited- 9644db8b-9b66-40ce-b245-f514cd6027e0-source-pod › cdi-clone-source- cdi-upload-tmp-pvc-fbe662e2-f95a-4149-9549-4f7abed41072 › cdi-upload-server 

      The resulting clone PVC volume definition:

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        annotations:
          cdi.kubevirt.io/cloneFallbackReason: The volume modes of source and target are
            incompatible
          cdi.kubevirt.io/clonePhase: Succeeded
          cdi.kubevirt.io/cloneType: copy
          cdi.kubevirt.io/storage.condition.running: "false"
          cdi.kubevirt.io/storage.condition.running.message: Clone Complete
          cdi.kubevirt.io/storage.condition.running.reason: Completed
          cdi.kubevirt.io/storage.contentType: kubevirt
          cdi.kubevirt.io/storage.pod.restarts: "0"
          cdi.kubevirt.io/storage.populator.progress: 100.0%
          cdi.kubevirt.io/storage.preallocation.requested: "false"
          cdi.kubevirt.io/storage.usePopulator: "true"
          pv.kubernetes.io/bind-completed: "yes"
          pv.kubernetes.io/bound-by-controller: "yes"
          volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
          volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
        creationTimestamp: "2024-03-09T13:27:05Z"
        finalizers:
        - kubernetes.io/pvc-protection
        labels:
          app: containerized-data-importer
          app.kubernetes.io/component: storage
          app.kubernetes.io/managed-by: cdi-controller
          app.kubernetes.io/part-of: hyperconverged-cluster
          app.kubernetes.io/version: 4.14.3
        name: myvol-converted
        namespace: kubevirt-example
        ownerReferences:
        - apiVersion: cdi.kubevirt.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: DataVolume
          name: myvol-converted
          uid: db156582-fee8-476b-9ac7-92e367ebc925
        resourceVersion: "1011506"
        uid: 2375704c-b13f-4da0-b49d-91cac6a6d247
      spec:
        accessModes:
        - ReadWriteMany
        dataSource:
          apiGroup: cdi.kubevirt.io
          kind: VolumeCloneSource
          name: volume-clone-source-db156582-fee8-476b-9ac7-92e367ebc925
        dataSourceRef:
          apiGroup: cdi.kubevirt.io
          kind: VolumeCloneSource
          name: volume-clone-source-db156582-fee8-476b-9ac7-92e367ebc925
        resources:
          requests:
            storage: "53687091200"
        storageClassName: ocs-storagecluster-ceph-rbd-virtualization
        volumeMode: Block
        volumeName: pvc-0c4dc9cc-7629-45f5-8e70-47a4875147b3
      status:
        accessModes:
        - ReadWriteMany
        capacity:
          storage: 50Gi
        phase: Bound 

      Check the clone PVC disk usage in Ceph:

      $ rbd du ocs-storagecluster-cephblockpool/csi-vol-56ac4e33-6fcf-4aa9-93c2-2c1337f4c86cNAME                                          PROVISIONED  USEDcsi-vol-56ac4e33-6fcf-4aa9-93c2-2c1337f4c86c       50 GiB  5 GiB 

      Note that the above result demonstrates two issues:

      1. The virtual size of the original volume was 5 GiB and so the expected provisioned size of the resulting block PVC is 5 GiB. Instead a 50 GiB PVC was provisioned. While running the test I didn't notice that virtual size detection was performed during the cloning that would have detected 5 GiB.
      2. The original volume used 1.2 GiB of disk space. It is expected that the resulting volume should also use about  1.2 GiB of disk space. Instead, the sparse volume was incorrectly expanded to the full virtual size of 5 GiB.

      The clone-source pod uses "/usr/bin/tar cv -S disk.img" command to stream the sparse image. The problem is likely on the upload server side. The upload server uses io.Copy() which doesn't handle sparse files correctly as per the Stackverflow entry Sparse files are huge with io.Copy(). The sparse images are likely expanded by the io.Copy() function.

      To turn the expanded image back to a sparse one, one can use the rbd sparsify command:

      $ rbd sparsify ocs-storagecluster-cephblockpool/csi-vol-56ac4e33-6fcf-4aa9-93c2-2c1337f4c86c 

      After sparsifying the volume, the volume disk usage of 5 GiB went down to 1 GiB. The resulting 1 GiB is smaller than the original 1.2 GiB probably due to the original volume including a file system overhead:

      $ rbd du ocs-storagecluster-cephblockpool/csi-vol-56ac4e33-6fcf-4aa9-93c2-2c1337f4c86c
      NAME                                          PROVISIONED  USED
      csi-vol-56ac4e33-6fcf-4aa9-93c2-2c1337f4c86c       50 GiB  1 GiB 

              mhenriks@redhat.com Michael Henriksen
              anosek@redhat.com Ales Nosek
              Dalia Frank Dalia Frank
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: