-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-10.0
-
None
-
None
-
Moderate
-
rhel-virt-storage
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
When using qemu-img convert with -o preallocation=falloc against a target that does not support the fallocate(2) syscall (such as NFS v4.1) , performance is significantly worse than using preallocation=full or disabling sparseness with -S 0.
NVR: qemu-img-9.2.4-2.fc42.x86_64
Steps to Reproduce:
- Mount an NFS share that does not support fallocate (e.g., NFS v4.1).
- Perform a qemu-img convert of a standard image to this mount using three different methods:
-
- -o preallocation=falloc
- -o preallocation=full
- -S 0
Compare execution times.
Expected results:
when falloc is not supported but the target, execution time should not differ between each method, and should be approximately the same
Actual results:
comparing preallocation methods used against the same NFS target:
root@nassouli-thinkpadp1gen7 /private $ time qemu-img convert -o preallocation=falloc -t writeback -p -O raw cirros-0.5.2-x86_64-disk.img /private/nfs/noam/disk.img (100.00/100%) real 1m34.269s user 0m0.030s sys 0m0.121s root@nassouli-thinkpadp1gen7 /private $ time qemu-img convert -o preallocation=full -t writeback -p -O raw cirros-0.5.2-x86_64-disk.img /private/nfs/noam/disk.img (100.00/100%) real 0m31.466s user 0m0.035s sys 0m0.044s root@nassouli-thinkpadp1gen7 /private $ time qemu-img convert -S 0 -t writeback -p -O raw cirros-0.5.2-x86_64-disk.img /private/nfs/noam/disk.img (100.00/100%) real 0m23.400s user 0m0.041s sys 0m0.044s
In our testing, falloc was unexpectedly the slowest method when unsupported by the underlying filesystem, taking ~4x longer than preallocation=full and ~6x longer than -S 0.
We would expect qemu-img to handle cases where fallocate is unsupported more gracefully, perhaps falling back to a method comparable to preallocation=full rather than the extremely slow glibc fallback currently encountered.
Root Cause Analysis:
strace -f analysis reveals that when fallocate(2) is unavailable
[pid 791095] fallocate(8, 0, 0, 117440512) = -1 EOPNOTSUPP (Operation not supported) ......
As QEMU utilizes posix_fallocate, The glibc fallback implementation for posix_fallocate allocates space by writing a single zero byte to the start of every block (4KB).
This results in image_size / 4KB separate write operations, causing severe performance degradation due to high IOPS usage.
Strace snippet showing 1-byte writes every 4KB:
[pid 791095] pwrite64(8, "\0", 1, 4095) = 1 [pid 791095] pwrite64(8, "\0", 1, 8191) = 1 [pid 791095] pwrite64(8, "\0", 1, 12287) = 1 [pid 791095] pwrite64(8, "\0", 1, 16383) = 1 [pid 791095] pwrite64(8, "\0", 1, 20479) = 1 [pid 791095] pwrite64(8, "\0", 1, 24575) = 1 [pid 791095] pwrite64(8, "\0", 1, 28671) = 1 [pid 791095] pwrite64(8, "\0", 1, 32767) = 1 [pid 791095] pwrite64(8, "\0", 1, 36863) = 1 .........
By comparison:
- preallocation=full uses 64KB buffers (resulting in significantly fewer write operations than the 4KB fallback).
- -S 0 skips the preallocation step entirely and just converts every block, resulting in the least total I/O in this scenario.
Conclusion:
qemu-img should ideally detect this inefficient fallback scenario and utilize a better method (like the 64KB buffers used in full or similar) when native fallocate is unsupported.
Alternatively, a new option that "works well in both cases" (smart auto-selection of the fastest available preallocation method) would be beneficial to avoid manual pre-checks of target storage capabilities.