-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
8
-
False
-
-
False
-
None
-
---
-
---
-
-
None
We were converting existing Portworx RWX volumes to Ceph RBD volumes. The Portworx RWX volumes are mode Filesystem implemented using NFSv3. Here is an example of a DataVolume definition used to perform the volume conversion:
apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: anosek-test2 namespace: anosek-volumetest spec: source: pvc: namespace: anosek-volumetest name: toolbox-container-home storage: accessModes: - ReadWriteMany storageClassName: ocs-storagecluster-ceph-rbd-virtualization volumeMode: Block
We had issues with some rather large volumes (> 1 TB). For these volumes, the conversion would never finish and the source pod logs were showing 0.00 data copied. This 0.00 never increased even after running the conversion for 16 hours:
VOLUME_MODE=filesystem MOUNT_POINT=/var/run/cdi/clone/source /var/run/cdi/clone/source / UPLOAD_BYTES=75161931776 I0805 13:17:30.771576 10 clone-source.go:220] content-type is "filesystem-clone" I0805 13:17:30.771650 10 clone-source.go:221] mount is "/var/run/cdi/clone/source" I0805 13:17:30.771657 10 clone-source.go:222] upload-bytes is 75161931776 I0805 13:17:30.771670 10 clone-source.go:239] Starting cloner target I0805 13:17:30.772054 10 clone-source.go:177] Executing [/usr/bin/tar cv -S disk.img] I0805 13:17:31.669967 10 clone-source.go:251] Set header to filesystem-clone I0805 13:17:31.685652 1 uploadserver.go:389] Content type header is "filesystem-clone" I0805 13:17:31.773938 10 prometheus.go:75] 0.00 I0805 13:17:32.774652 10 prometheus.go:75] 0.00 I0805 13:17:33.774788 10 prometheus.go:75] 0.00 I0805 13:17:34.775655 10 prometheus.go:75] 0.00 I0805 13:17:35.776723 10 prometheus.go:75] 0.00 I0805 13:17:36.778371 10 prometheus.go:75] 0.00 I0805 13:17:37.779667 10 prometheus.go:75] 0.00 I0805 13:17:38.780654 10 prometheus.go:75] 0.00 I0805 13:17:39.780899 10 prometheus.go:75] 0.00 I0805 13:17:40.781693 10 prometheus.go:75] 0.00 I0805 13:17:41.782667 10 prometheus.go:75] 0.00 I0805 13:17:42.782997 10 prometheus.go:75] 0.00 I0805 13:17:43.784318 10 prometheus.go:75] 0.00 I0805 13:17:44.784948 10 prometheus.go:75] 0.00 I0805 13:17:45.785671 10 prometheus.go:75] 0.00 I0805 13:17:46.786665 10 prometheus.go:75] 0.00 I0805 13:17:47.787738 10 prometheus.go:75] 0.00 I0805 13:17:48.788669 10 prometheus.go:75] 0.00 I0805 13:17:49.789676 10 prometheus.go:75] 0.00 I0805 13:17:50.790721 10 prometheus.go:75] 0.00 I0805 13:17:51.791725 10 prometheus.go:75] 0.00 ...
Looking into the issue we realized that the source pod uses the following command to read the volume:
$ tar cv -S disk.img
While the logs were showing 0.00 progress, we straced the tar command running in the source pod. Tar was actually busy reading the disk image data but not writing anything into the pipe. Reading the tar documentation, the problem is likely that our filesystem doesn't support lseek with SEEK_HOLE and SEEK_DATA. In this case, tar will read the whole disk image twice: the first time to find the holes and the second time to copy the data.
Reading a 1 TB volume twice is very inefficient and takes a lot of time. The first pass took about 1 hour 40 minutes to complete. During this period, no data is sent over an open connection between the source pod and the upload server. We suspect that before any data could have been copied over, the idle connection timed out and the whole conversion could never finish.
To work around the issue, we set the DataVolume.spec.preallocation = true. Updated example:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: anosek-test2
namespace: anosek-volumetest
spec:
source:
pvc:
namespace: anosek-volumetest
name: toolbox-container-home
storage:
accessModes:
- ReadWriteMany
storageClassName: ocs-storagecluster-ceph-rbd-virtualization
volumeMode: Block
preallocation: true
With preallocation = true, the generated tar command has no longer the -S parameter included:
$ tar cv disk.img
The volume conversion succeeds. After the conversion is complete, we run rbd sparsify on the resulting volume to make it sparse again.