Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-88561

qemu graph deadlock during job-dismiss

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • qemu-kvm-10.0.0-8.el10
    • No
    • Important
    • 2
    • rhel-virt-storage
    • ssg_virtualization
    • 24
    • 5
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • virt-storage Sprint 6, Planning backlog
    • Unspecified
    • Unspecified
    • Unspecified
    • All
    • None

      What were you trying to do that didn't work?

      Based on an upstream report from Andrey Drobyshev

      https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html

      What is the impact of this issue to you?

      The potential to cause qemu deadlocks when doing block jobs can interfere with live migration.

      Please provide the package NVR for which the bug is seen:

      How reproducible is this bug?:

      Sporadic

      Steps to reproduce

      1. Per Andrey's email:
      2. 1. Run QEMU:
        > SRCDIR=/path/to/srcdir
        >
        > $SRCDIR/build/qemu-system-x86_64 -enable-kvm \
        >   -machine q35 -cpu Nehalem \
        >   -name guest=alma8-vm,debug-threads=on \
        >   -m 2g -smp 2 \
        >   -nographic -nodefaults \
        >   -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
        >   -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
        >   -object iothread,id=iothread0 \
        >   -blockdev node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 \
        >   -device virtio-blk-pci,drive=disk,iothread=iothread0

        2. Launch IO (random reads) from within the guest:
        > nc -U /var/run/alma8-serial.sock
        > ...
        > [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k --size=1G --numjobs=1 --time_based=1 --runtime=300
        +--group_reporting --rw=randread --iodepth=1 --filename=/testfile

        3. Run snapshots creation & removal of lower snapshot operation in a
        loop (script attached):
        > while /bin/true ; do ./remove_lower_snap.sh ; done

      3. where remove_lower_snap.sh is:
      4. #!/bin/bash

        SRCDIR=/path/to/srcdir
        STORDIR=/path/to/img
        SNAP1=$STORDIR/snap1.qcow2
        SNAP2=$STORDIR/snap2.qcow2
        QMPSHELL=$SRCDIR/scripts/qmp/qmp-shell
        QMPSOCK=/var/run/alma8-qmp.sock

        function qmp_filter()

        Unknown macro: {     sed -r '/^(Welcome|Connected)/d' }

        function waitjob()

        Unknown macro: {     jobid=$1     while /bin/true ; do         qbjout=$($QMPSHELL -p $QMPSOCK <<EOF             query-block-jobs EOF        )     jobstatus=$(echo "$qbjout" | grep '"status"' | head -1 | awk '{print $2}

        ' | sed 's/[",]//g')

                if [ "x${jobstatus}" == "xready" ] ; then
                    echo -e "\n######### Complete job $jobid #########\n"
                    $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
                        job-complete id=$jobid
        EOF
               elif [ "x${jobstatus}" == "xconcluded" ] ; then
                    echo -e "\n######### Dismiss job $jobid #########\n"
                    $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
                        job-dismiss id=$jobid
        EOF
               elif [ "x${jobstatus}" == "x" ] ; then
                    break
                fi

            sleep 0.5
            done
        }

        echo -e "\n######### Create snapshot images #########\n"

        qemu-img create -f qcow2 $SNAP1 16G
        qemu-img create -f qcow2 $SNAP2 16G

        echo -e "\n######### Create 1st snapshot #########\n"

        $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
            blockdev-add driver=qcow2 node-name=snap1 file={"driver":"file","filename":"$SNAP1"}
            blockdev-snapshot node=disk overlay=snap1
        EOF

        echo -e "\n######### Create 2nd snapshot #########\n"

        $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
            blockdev-add driver=qcow2 node-name=snap2 file={"driver":"file","filename":"$SNAP2"}
            blockdev-snapshot node=snap1 overlay=snap2
        EOF

        echo -e "\n######### Commit lower snapshot #########\n"

        $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
            block-commit device=snap2 top-node=snap1 base-node=disk auto-finalize=true auto-dismiss=false job-id=commit-snap1
        EOF

        waitjob commit-snap1

        echo -e "\n######### Commit remaining snapshot #########\n"

        $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
            block-commit device=snap2 top-node=snap2 base-node=disk auto-finalize=true auto-dismiss=false job-id=commit-snap2
        EOF

        waitjob commit-snap2

        echo -e "\n######### Remove unneeded snapshot nodes #########\n"

        $QMPSHELL -p $QMPSOCK <<EOF | qmp_filter
            blockdev-del node-name=snap1
            blockdev-del node-name=snap2
        EOF

        echo -e "\n######### Done! #########\n"

      5.  

      Expected results

      No hang

      Actual results

      Running the script in a loop can hit deadlock

              kwolf@redhat.com Kevin Wolf
              eblake_redhat Eric Blake
              virt-maint virt-maint
              Qinghua Cheng Qinghua Cheng
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: