Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-1761

[kvm-qemu] Qemu hang when do block jobs on multiple disks(all bind to the same iothread)

    • qemu-kvm-8.2.0-1.el9
    • Major
    • Regression, CustomerScenariosInitiative
    • sst_virtualization_storage
    • ssg_virtualization
    • 3
    • QE ack
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None

      What were you trying to do that didn't work?

      Qemu hang when do block jobs on multiple disks(all bind to the same iothread)

       

      Please provide the package NVR for which bug is seen:

      kernel version:5.14.0-351.el9.x86_64

      qemu-kvm version:qemu-kvm-8.1.0-0.el9.preview

      How reproducible:

      100%

      Steps to reproduce

      1. Start guest with qemu cmdline:

      /usr/libexec/qemu-kvm \
           -S \
           -name 'avocado-vt-vm1'  \
           -sandbox on  \
           -blockdev '\{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
           -blockdev '\{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
           -blockdev '\{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio-scsi-ovmf_qcow2_filesystem_VARS.raw", "auto-read-only": true, "discard": "unmap"}' \
           -blockdev '\{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
           -machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,memory-backend=mem-machine_mem \
           -device '\{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
           -device '\{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
           -nodefaults \
           -device '\{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
           -m 30720 \
           -object '\{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
           -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
           -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,fb-clear=on,hle=off,rtm=off,kvm_pv_unhalt=on \
           -chardev socket,server=on,wait=off,path=/var/tmp/avocado_4nitfgtn/monitor-qmpmonitor1-20230830-035235-lnWcIJxr,id=qmp_id_qmpmonitor1  \
           -mon chardev=qmp_id_qmpmonitor1,mode=control \
           -chardev socket,server=on,wait=off,path=/var/tmp/avocado_4nitfgtn/monitor-catch_monitor-20230830-035235-lnWcIJxr,id=qmp_id_catch_monitor  \
           -mon chardev=qmp_id_catch_monitor,mode=control \
           -device '\{"ioport": 1285, "driver": "pvpanic", "id": "idivUWJq"}' \
           -chardev socket,server=on,wait=off,path=/var/tmp/avocado_4nitfgtn/serial-serial0-20230830-035235-lnWcIJxr,id=chardev_serial0 \
           -device '\{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
           -chardev socket,id=seabioslog_id_20230830-035235-lnWcIJxr,path=/var/tmp/avocado_4nitfgtn/seabios-20230830-035235-lnWcIJxr,server=on,wait=off \
           -device isa-debugcon,chardev=seabioslog_id_20230830-035235-lnWcIJxr,iobase=0x402 \
           -device '\{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
           -device '\{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
           -device '\{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
           -object '\{"qom-type": "iothread", "id": "iothread0"}' \
           -device '\{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
           -device '\{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
           -blockdev '\{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \
           -blockdev '\{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}
      , "file": "file_image1"}' \
           -device '\{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
           -blockdev '\{"node-name": "file_data1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/root/avocado/data/avocado-vt/data1.qcow2", "cache": {"direct": true, "no-flush": false}}' \
           -blockdev '{"node-name": "drive_data1", "driver": "qcow2", "read-only": false, "cache":
      {"direct": true, "no-flush": false}
      , "file": "file_data1"}' \
           -device '\{"driver": "scsi-hd", "id": "data1", "drive": "drive_data1", "write-cache": "on", "serial": "DATA_DISK1"}' \
           -blockdev '\{"node-name": "file_data2", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/root/avocado/data/avocado-vt/data2.qcow2", "cache": {"direct": true, "no-flush": false}}' \
           -blockdev '{"node-name": "drive_data2", "driver": "qcow2", "read-only": false, "cache":
      {"direct": true, "no-flush": false}
      , "file": "file_data2"}' \
           -device '\{"driver": "scsi-hd", "id": "data2", "drive": "drive_data2", "write-cache": "on", "serial": "DATA_DISK2"}' \
           -device '\{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
           -device '\{"driver": "virtio-net-pci", "mac": "9a:22:61:10:fc:be", "id": "idIlMuBG", "netdev": "id7Fqouz", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
           -netdev tap,id=id7Fqouz,vhost=on,vhostfd=16,fd=12  \
           -vnc :0  \
           -rtc base=utc,clock=host,driftfix=slew  \
           -boot menu=off,order=cdn,once=c,strict=off \
           -chardev socket,id=char_vtpm_avocado-vt-vm1_tpm0,path=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_swtpm.sock \
           -tpmdev emulator,chardev=char_vtpm_avocado-vt-vm1_tpm0,id=emulator_vtpm_avocado-vt-vm1_tpm0 \
           -device '\{"id": "tpm-crb_vtpm_avocado-vt-vm1_tpm0", "tpmdev": "emulator_vtpm_avocado-vt-vm1_tpm0", "driver": "tpm-crb"}' \
           -enable-kvm \
           -device '\{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
      

      2. Continue vm 

      {"execute": "cont", "id": "jNqV2P7F"} 

      3. Format two data disks:sda/sdb and write data on them                                                         

      (guest)#parted -s "/dev/sda"
      #mklabel msdos
      #parted -s "/dev/sda" &&mkpart primary 0M 2048.0M &&yes|mkfs.ext4 -F '/dev/sda1'
      #mkdir /mnt/sda1 &&mount -t ext4 /dev/sda1 /mnt/sda1
      #dd if=/dev/urandom of=/mnt/sda1/qKMS bs=1M count=500 oflag=direct
      #md5sum /mnt/sda1/qKMS > /mnt/sda1/qKMS.md5 && sync
      

      4. Create target node:drive_data1 and drive_data2 and do live snapshot

      #create target node1:drive_data1sn
      {"execute": "blockdev-create", "arguments": {"options": {"driver": "file", "filename": "/root/avocado/data/avocado-vt/data1sn.qcow2", "size": 2147483648}, "job-id": "file_data1sn"}, "id": "efxtZrNJ"}
      {"execute": "job-dismiss", "arguments": {"id": "file_data1sn"}, "id": "pbFDJlYi"}
      {"execute": "blockdev-add", "arguments": {"node-name": "drive_data1sn", "driver": "qcow2", "file": "file_data1sn", "read-only": false}, "id": "ZsPpCgGh"}
      {"execute": "blockdev-create", "arguments": {"options": {"driver": "qcow2", "file": "file_data1sn", "size": 2147483648}, "job-id": "drive_data1sn"}, "id": "w0GiKhh7"}
      {"execute": "job-dismiss", "arguments": {"id": "drive_data1sn"}, "id": "XvEL2jrC"}
      {"execute": "blockdev-add", "arguments": {"node-name": "drive_data1sn", "driver": "qcow2", "file": "file_data1sn", "read-only": false}, "id": "ZsPpCgGh"}
      
      #create target node2:driver_data2sn
      {"execute": "blockdev-create", "arguments": {"options": {"driver": "file", "filename": "/root/avocado/data/avocado-vt/data2sn.qcow2", "size": 2147483648}, "job-id": "file_data2sn"}, "id": "sJIWAx4i"}
      {"execute": "job-dismiss", "arguments": {"id": "file_data2sn"}, "id": "XSQ6BJwv"}
      {"execute": "blockdev-add", "arguments": {"node-name": "file_data2sn", "driver": "file", "filename": "/root/avocado/data/avocado-vt/data2sn.qcow2", "aio": "threads", "auto-read-only": true, "discard": "unmap"}, "id": "8s87kdhR"}
      {"execute": "blockdev-create", "arguments": {"options": {"driver": "qcow2", "file": "file_data2sn", "size": 2147483648}, "job-id": "drive_data2sn"}, "id": "h5OIzotn"}
      {"execute": "job-dismiss", "arguments": {"id": "drive_data2sn"}, "id": "eVzDxzfe"}
      {"execute": "blockdev-add", "arguments": {"node-name": "drive_data2sn", "driver": "qcow2", "file": "file_data2sn", "read-only": false}, "id": "SUr6YsB4"}
      
      
       #create snapshot chains
      {"execute": "blockdev-snapshot", "arguments": {"node": "drive_data1", "overlay": "drive_data1sn"}, "id": "gB0BqzVp"}
      {"execute": "blockdev-snapshot", "arguments": {"node": "drive_data2", "overlay": "drive_data2sn"}, "id": "sxDYFRjS"}

      5. Do stream to drive_data2sn

      {"execute": "block-stream", "arguments": {"device": "drive_data2sn", "job-id": "drive_data2sn_IPS8"}, "id": "nbHHCwRL"} 

      6. During stream, check block job status

      {"execute": "query-jobs", "id": "IYClQwF6"} 

      Expected results

      Stream can executed successfully

      Actual results

       Qemu hang during stream, gdb info as bellow:

      #gdb -p 471025
      GNU gdb (GDB) Red Hat Enterprise Linux 10.2-11.el9
      Copyright (C) 2021 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
      Type "show copying" and "show warranty" for details.
      This GDB was configured as "x86_64-redhat-linux-gnu".
      Type "show configuration" for configuration details.
      For bug reporting instructions, please see:
      <https://www.gnu.org/software/gdb/bugs/>.
      Find the GDB manual and other documentation resources online at:
          <http://www.gnu.org/software/gdb/documentation/>.

      For help, type "help".
      Type "apropos word" to search for commands related to "word".
      Attaching to process 471025
      [New LWP 471026]
      [New LWP 471027]
      [New LWP 471032]
      [New LWP 471033]
      [New LWP 471034]
      [New LWP 471035]
      [New LWP 471036]
      [New LWP 471037]
      [New LWP 471038]
      [New LWP 471039]
      [New LWP 471040]
      [New LWP 471041]
      [New LWP 471042]
      [New LWP 471044]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib64/libthread_db.so.1".
      0x00007fcd09d42abe in ppoll () from /lib64/libc.so.6
      Missing separate debuginfos, use: dnf debuginfo-install qemu-kvm-core-8.1.0-0.el9.preview.x86_64
      (gdb) bt
      #0  0x00007fcd09d42abe in ppoll () at /lib64/libc.so.6
      #1  0x0000564cdbd13882 in fdmon_poll_wait.llvm ()
      #2  0x0000564cdbd12de6 in aio_poll ()
      #3  0x0000564cdbb4a4e1 in bdrv_graph_wrlock ()
      #4  0x0000564cdbb115cb in bdrv_replace_child_noperm.llvm ()
      #5  0x0000564cdbb11491 in bdrv_root_unref_child ()
      #6  0x0000564cdbb3f1c3 in blk_unref ()
      #7  0x0000564cdbbc0060 in stream_clean ()
      #8  0x0000564cdbb261a1 in job_finalize_single_locked.llvm ()
      #9  0x0000564cdbb24d1c in job_do_finalize_locked.llvm ()
      #10 0x0000564cdbb265bb in job_exit ()
      #11 0x0000564cdbd2de31 in aio_bh_poll ()
      #12 0x0000564cdbd122d4 in aio_dispatch ()
      #13 0x0000564cdbd2f28f in aio_ctx_dispatch ()
      #14 0x00007fcd0a4a5e2f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
      #15 0x0000564cdbd301ee in main_loop_wait ()
      #16 0x0000564cdb8691e7 in qemu_main_loop ()
      #17 0x0000564cdb6bdc6a in qemu_default_main ()
      #18 0x00007fcd09c3feb0 in __libc_start_call_main () at /lib64/libc.so.6
      #19 0x00007fcd09c3ff60 in __libc_start_main_impl () at /lib64/libc.so.6
      #20 0x0000564cdb6bd3d5 in _start ()

      Note: We have a same issue, which has been resolved.

      Bug 2190368 - Qemu hang when do block jobs on multiple disks(all bind to the same iothread)

            kwolf@redhat.com Kevin Wolf
            aliang@redhat.com Aihua Liang
            virt-maint virt-maint
            Aihua Liang Aihua Liang
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: