Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-127167

qemu-img with preallocation=falloc exhibits significant performance degradation on targets lacking fallocate support (e.g., NFS v4.1)

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhel-10.0
    • qemu-kvm / Storage
    • None
    • None
    • Moderate
    • rhel-virt-storage
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      When using qemu-img convert with -o preallocation=falloc against a target that does not support the fallocate(2) syscall (such as NFS v4.1) , performance is significantly worse than using preallocation=full or disabling sparseness with -S 0.

      NVR: qemu-img-9.2.4-2.fc42.x86_64

      Steps to Reproduce:

      1. Mount an NFS share that does not support fallocate (e.g., NFS v4.1).
      2. Perform a qemu-img convert of a standard image to this mount using three different methods:
        1. -o preallocation=falloc
        2. -o preallocation=full
        3. -S 0

      Compare execution times.

      Expected results:
      when falloc is not supported but the target, execution time should not differ between each method, and should be approximately the same

      Actual results:
      comparing preallocation methods used against the same NFS target:

      root@nassouli-thinkpadp1gen7 /private $ time qemu-img convert -o preallocation=falloc -t writeback -p -O raw cirros-0.5.2-x86_64-disk.img /private/nfs/noam/disk.img
      (100.00/100%)
      real 1m34.269s
      user 0m0.030s
      sys 0m0.121s
      root@nassouli-thinkpadp1gen7 /private $ time qemu-img convert -o preallocation=full -t writeback -p -O raw cirros-0.5.2-x86_64-disk.img /private/nfs/noam/disk.img
      (100.00/100%)
      real 0m31.466s
      user 0m0.035s
      sys 0m0.044s
      root@nassouli-thinkpadp1gen7 /private $ time qemu-img convert -S 0 -t writeback -p -O raw cirros-0.5.2-x86_64-disk.img /private/nfs/noam/disk.img
      (100.00/100%)
      real 0m23.400s
      user 0m0.041s
      sys 0m0.044s
       

      In our testing, falloc was unexpectedly the slowest method when unsupported by the underlying filesystem, taking ~4x longer than preallocation=full and ~6x longer than -S 0.

      We would expect qemu-img to handle cases where fallocate is unsupported more gracefully, perhaps falling back to a method comparable to preallocation=full rather than the extremely slow glibc fallback currently encountered.

      Root Cause Analysis:
      strace -f analysis reveals that when fallocate(2) is unavailable

      [pid 791095] fallocate(8, 0, 0, 117440512) = -1 EOPNOTSUPP (Operation not supported)
      ......

      As QEMU utilizes posix_fallocate, The glibc fallback implementation for posix_fallocate allocates space by writing a single zero byte to the start of every block (4KB).

      This results in image_size / 4KB separate write operations, causing severe performance degradation due to high IOPS usage.

      Strace snippet showing 1-byte writes every 4KB:

      [pid 791095] pwrite64(8, "\0", 1, 4095) = 1
      [pid 791095] pwrite64(8, "\0", 1, 8191) = 1
      [pid 791095] pwrite64(8, "\0", 1, 12287) = 1
      [pid 791095] pwrite64(8, "\0", 1, 16383) = 1
      [pid 791095] pwrite64(8, "\0", 1, 20479) = 1
      [pid 791095] pwrite64(8, "\0", 1, 24575) = 1
      [pid 791095] pwrite64(8, "\0", 1, 28671) = 1
      [pid 791095] pwrite64(8, "\0", 1, 32767) = 1
      [pid 791095] pwrite64(8, "\0", 1, 36863) = 1
      ......... 

      By comparison:

      • preallocation=full uses 64KB buffers (resulting in significantly fewer write operations than the 4KB fallback).
      • -S 0 skips the preallocation step entirely and just converts every block, resulting in the least total I/O in this scenario.

      Conclusion:
      qemu-img should ideally detect this inefficient fallback scenario and utilize a better method (like the 64KB buffers used in full or similar) when native fallocate is unsupported.
      Alternatively, a new option that "works well in both cases" (smart auto-selection of the fastest available preallocation method) would be beneficial to avoid manual pre-checks of target storage capabilities.

        1. falloc_output.txt
          1.73 MB
        2. full_output.txt
          609 kB

              hreitz@redhat.com Hanna Czenczek
              rh-ee-nassouli Noam Assouline
              virt-maint virt-maint
              Tingting Mao Tingting Mao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: