Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-129540

Assertion failure on drain with iothread and I/O load

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • rhel-10.2
    • rhel-10.2
    • qemu-kvm / Storage
    • None
    • None
    • None
    • rhel-virt-storage
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      This bug was first reported upstream: https://lists.gnu.org/archive/html/qemu-block/2025-11/msg00336.html

      The root cause of the bug is that after drained_end, there is a race window between blk_root_drained_end() waking up a queued request in an iothread from the main thread and blk_wait_while_drained() actually being woken up in the iothread and calling blk_in_flight(). If the BlockBackend is drained again during this window, drain won't wait for this request and it will sneak in when the BlockBackend is already supposed to be quiesced. This causes assertion failures in bdrv_drain_all_begin() and can have other unintended consequences.

      I reproduced it with the following test script on qemu-kvm-10.1.0-5.el10:

      #!/bin/bash
      
      qmp() {
      read -n 1 -s -p "Press any key to continue..." >&2
      
      cat <<EOF
      {"execute": "qmp_capabilities"}
      EOF
      
      SNAPNODE=empty-format
      
      for i in {1..50} ; do
      cat <<EOF
          {"execute":"blockdev-add", "arguments": {"node-name":"snap${i}-storage","driver":"file","filename":"/tmp/snap${i}.qcow2","aio":"native","cache":{"no-flush":false,"direct":true}}}
          {"execute":"blockdev-add", "arguments": {"node-name":"snap${i}-format","driver":"qcow2","file":"snap${i}-storage"}}
          {"execute":"blockdev-snapshot", "arguments": {"node":"$SNAPNODE","overlay":"snap${i}-format"}}
      EOF
      SNAPNODE=snap${i}-format
      done
      }
      
      ./qemu-img create -f qcow2 /tmp/disk.qcow2 64G
      for i in {1..50} ; do
          ./qemu-img create -f qcow2 /tmp/snap${i}.qcow2 64G
      done
      
      qmp | ./qemu-system-x86_64 -enable-kvm -M q35 -cpu host -m 4G -qmp stdio \
          -object iothread,id=iothread1 \
          -blockdev node-name=empty-storage,driver=file,filename=/tmp/disk.qcow2,aio=native,cache.direct=on \
          -blockdev node-name=empty-format,driver=qcow2,file=empty-storage \
          -device virtio-blk-pci,drive=empty-format,iothread=iothread1,id=virtio-disk1 \
          -cdrom /home/kwolf/images/iso/rhel-9.3-x86_64-dvd.iso \
          -boot d
      
      1. Run the script to create the images and start QEMU
      2. Inside the guest (booted into a rescue system from a RHEL ISO in my case), run the following:
        for i in $(seq 0 32); do dd if=/dev/zero of=/dev/vda bs=4k oflag=direct & ; done
        
      1. Then press enter on the terminal running the above script in order to start QMP activity
      2. The qemu-kvm process crashes with an assertion failure:
        qemu-system-x86_64: ../block/io.c:441: void bdrv_drain_assert_idle(BlockDriverState *): Zusicherung »qatomic_read(&bs->in_flight) == 0« nicht erfüllt.
        

      The same bug exists in older RHEL versions, but it is not exposed by the same reproducer because drain operations are called in different places there.

              kwolf@redhat.com Kevin Wolf
              kwolf@redhat.com Kevin Wolf
              virt-maint virt-maint
              qing wang qing wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: