Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-45701

Volume copying never finishes for volumes that don't support seek hole detection

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • CNV v4.18.1
    • None
    • Storage Platform
    • None
    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • ---
    • ---
    • None

      We were converting existing Portworx RWX volumes to Ceph RBD volumes. The Portworx RWX volumes are mode Filesystem implemented using NFSv3. Here is an example of a DataVolume definition used to perform the volume conversion:

      apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        name: anosek-test2
        namespace: anosek-volumetest
      spec:
        source:
          pvc:
            namespace: anosek-volumetest
            name: toolbox-container-home
        storage:
          accessModes:
          - ReadWriteMany
          storageClassName: ocs-storagecluster-ceph-rbd-virtualization
          volumeMode: Block 

      We had issues with some rather large volumes (> 1 TB). For these volumes, the conversion would never finish and the source pod logs were showing 0.00 data copied. This 0.00 never increased even after running the conversion for 16 hours:

      VOLUME_MODE=filesystem
      MOUNT_POINT=/var/run/cdi/clone/source
      /var/run/cdi/clone/source /
      UPLOAD_BYTES=75161931776
      I0805 13:17:30.771576      10 clone-source.go:220] content-type is "filesystem-clone"
      I0805 13:17:30.771650      10 clone-source.go:221] mount is "/var/run/cdi/clone/source"
      I0805 13:17:30.771657      10 clone-source.go:222] upload-bytes is 75161931776
      I0805 13:17:30.771670      10 clone-source.go:239] Starting cloner target
      I0805 13:17:30.772054      10 clone-source.go:177] Executing [/usr/bin/tar cv -S disk.img]
      I0805 13:17:31.669967      10 clone-source.go:251] Set header to filesystem-clone
      I0805 13:17:31.685652       1 uploadserver.go:389] Content type header is "filesystem-clone"
      I0805 13:17:31.773938      10 prometheus.go:75] 0.00
      I0805 13:17:32.774652      10 prometheus.go:75] 0.00
      I0805 13:17:33.774788      10 prometheus.go:75] 0.00
      I0805 13:17:34.775655      10 prometheus.go:75] 0.00
      I0805 13:17:35.776723      10 prometheus.go:75] 0.00
      I0805 13:17:36.778371      10 prometheus.go:75] 0.00
      I0805 13:17:37.779667      10 prometheus.go:75] 0.00
      I0805 13:17:38.780654      10 prometheus.go:75] 0.00
      I0805 13:17:39.780899      10 prometheus.go:75] 0.00
      I0805 13:17:40.781693      10 prometheus.go:75] 0.00
      I0805 13:17:41.782667      10 prometheus.go:75] 0.00
      I0805 13:17:42.782997      10 prometheus.go:75] 0.00
      I0805 13:17:43.784318      10 prometheus.go:75] 0.00
      I0805 13:17:44.784948      10 prometheus.go:75] 0.00
      I0805 13:17:45.785671      10 prometheus.go:75] 0.00
      I0805 13:17:46.786665      10 prometheus.go:75] 0.00
      I0805 13:17:47.787738      10 prometheus.go:75] 0.00
      I0805 13:17:48.788669      10 prometheus.go:75] 0.00
      I0805 13:17:49.789676      10 prometheus.go:75] 0.00
      I0805 13:17:50.790721      10 prometheus.go:75] 0.00
      I0805 13:17:51.791725      10 prometheus.go:75] 0.00
      ... 

      Looking into the issue we realized that the source pod uses the following command to read the volume:

      $ tar cv -S disk.img 

      While the logs were showing 0.00 progress, we straced the tar command running in the source pod. Tar was actually busy reading the disk image data but not writing anything into the pipe. Reading the tar documentation, the problem is likely that our filesystem doesn't support lseek with SEEK_HOLE and SEEK_DATA. In this case, tar will read the whole disk image twice: the first time to find the holes and the second time to copy the data.

      Reading a 1 TB volume twice is very inefficient and takes a lot of time. The first pass took about 1 hour 40 minutes to complete. During this period, no data is sent over an open connection between the source pod and the upload server. We suspect that before any data could have been copied over, the idle connection timed out and the whole conversion could never finish.

      To work around the issue, we set the DataVolume.spec.preallocation = true.  Updated example:

      apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        name: anosek-test2
        namespace: anosek-volumetest
      spec:
        source:
          pvc:
            namespace: anosek-volumetest
            name: toolbox-container-home
        storage:
          accessModes:
          - ReadWriteMany
          storageClassName: ocs-storagecluster-ceph-rbd-virtualization
          volumeMode: Block
        preallocation: true 

      With preallocation = true, the generated tar command has no longer the -S parameter included:

      $ tar cv disk.img

      The volume conversion succeeds. After the conversion is complete, we run rbd sparsify on the resulting volume to make it sparse again.

              rh-ee-alromero Alvaro Romero
              anosek@redhat.com Ales Nosek
              Kevin Alon Goldblatt Kevin Alon Goldblatt
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: