[RHEL-3934] [qemu-kvm] Failed on repeatedly hotplug/unplug disk iothread enabled

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: rhel-9.4
Affects Version/s: rhel-9.4
Component/s: qemu-kvm / Storage / virtio-blk,scsi
Labels:

Fixed in Build:
qemu-kvm-8.2.0-6.el9
Regression:
Yes
Severity:
Critical
Keywords:

Regression, CustomerScenariosInitiative

Pool Team:

rhel-sst-virtualization-storage
Sub-System Group:

ssg_virtualization

Dev Target Milestone:
25
Internal Target Milestone:
27
Story Points:
5
ACKs Check:

QE ack
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Products:

Red Hat Enterprise Linux
Sprint:
None

Git Pull Request:
https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/merge_requests/223
Preliminary Testing:
Pass
Errata Link:
https://errata.engineering.redhat.com/advisory/121202
Test Coverage:

Automated

Experience:
Architecture:

Unspecified
OS:
Linux

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Failed on repeatedly hotplug/unplug disks iothread enabled
It also makes guest system data corrupt and inaccessible sometimes.

Please provide the package NVR for which bug is seen:
Red Hat Enterprise Linux release 9.4 Beta (Plow)
5.14.0-363.el9.x86_64
qemu-kvm-8.1.0-1.el9.x86_64
seabios-bin-1.16.1-1.el9.noarch
edk2-ovmf-20230524-3.el9.noarch
libvirt-9.5.0-6.el9.x86_64

How reproducible:
30%

Steps to reproduce

// code placeholder
1.Boot vm

 /usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 30720 \
     -object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
     -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,fb-clear=on,hle=off,rtm=off,kvm_pv_unhalt=on \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -object '{"qom-type": "iothread", "id": "iothread0"}' \
     -object '{"qom-type": "iothread", "id": "iothread1"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel940-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "bootindex": 0, "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:21:b2:e0:6d:4b", "id": "idwbiXqo", "netdev": "iduN1RGA", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=iduN1RGA,vhost=on \
     -vnc :5 \
     -monitor stdio \
     -qmp tcp:0:5955,server=on,wait=off \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
     -chardev socket,id=socket-serial,path=/var/tmp/socket-serial,logfile=/var/tmp/file-serial.log,mux=on,server=on,wait=off \
     -serial chardev:socket-serial \
     -chardev file,path=/var/tmp/file-bios.log,id=file-bios \
     -device isa-debugcon,chardev=file-bios,iobase=0x402 \
 
2. hotplug disk
{"execute": "device_add", "arguments":{"id": "virtio_scsi_pci1", "driver": "virtio-scsi-pci", "bus": "pcie_extra_root_port_0", "addr": "0x0", "iothread": "iothread1"}, "id": "MSVjNpxq"}
{"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/storage0.qcow2", "cache": {"direct": true, "no-flush": false}}, "id": "uNRhq5lR"}
{"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache":{"direct": true, "no-flush": false}, "file": "file_stg0"}, "id": "VtvrjLkH"}
{"execute": "device_add", "arguments":
{"driver": "scsi-hd", "id": "stg0", "bus": "virtio_scsi_pci1.0", "drive": "drive_stg0", "write-cache": "on"}
, "id": "pWLotHhW"}

3. check the disk in guest,should added
lsblk

4. unplug disk
{"execute": "device_del", "arguments":{"id": "stg0"}, "id": "R81C1q8p"}
{"execute": "blockdev-del", "arguments":{"node-name": "drive_stg0"}, "id": "EAiI2LVe"}
{"execute": "blockdev-del", "arguments":{"node-name": "file_stg0"}, "id": "IJRbWqTG"}
{"execute": "device_del", "arguments":{"id": "virtio_scsi_pci1"}, "id": "w6mivmt2"}

5. check the disk in guest, should deleted
lsblk

6. repeat steps 2-5 100times

Expected results
disk hotplug/unplug succeed and the guest may check then existence.

Actual results

hotplug fail, or os crash.

Not hit this issue on
qemu-kvm-8.0.0-13.el9.x86_64

Not hit this issue without iothread

links to

RHBA-2023:121202 qemu-kvm bug fix and enhancement update

Errata Tool added a comment - 2024/04/30 9:38 AM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (Moderate: qemu-kvm security update), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2024:2135

Errata Tool added a comment - 2024/04/30 9:38 AM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: qemu-kvm security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2135

Aihua Liang added a comment - 2024/02/21 2:12 PM

As the latest comment, not hit the issue in qemu-kvm-8.2.0-6.el9.x86_64, will set its status to "Release Pending" and track the new issue by:RHEL-26193.

Aihua Liang added a comment - 2024/02/21 2:12 PM As the latest comment, not hit the issue in qemu-kvm-8.2.0-6.el9.x86_64, will set its status to "Release Pending" and track the new issue by:RHEL-26193.

Miroslav Rezanina added a comment - 2024/02/19 9:41 AM

Fix included in qemu-kvm-8.2.0-6.el9

Fixed by merge request 'virtio: Re-enable notifications after drain' ( https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/merge_requests/223 )

Miroslav Rezanina added a comment - 2024/02/19 9:41 AM Fix included in qemu-kvm-8.2.0-6.el9 Fixed by merge request 'virtio: Re-enable notifications after drain' ( https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/merge_requests/223 )

Hanna Czenczek added a comment - 2024/02/09 1:34 PM

Upstream patches have been merged (commits c42c3833e0cfdf2b80fb3ca410acfd392b6874ab, 5bdbaebcce18fe6a627cafad2043ec08f3de5744, 52bff01f64eec017ffb0d5903a0ee1d67ca7a548), but backport is blocked because they (at least the last commit) depend on MR 219.

Hanna Czenczek added a comment - 2024/02/09 1:34 PM Upstream patches have been merged (commits c42c3833e0cfdf2b80fb3ca410acfd392b6874ab, 5bdbaebcce18fe6a627cafad2043ec08f3de5744, 52bff01f64eec017ffb0d5903a0ee1d67ca7a548), but backport is blocked because they (at least the last commit) depend on MR 219 .

Hanna Czenczek added a comment - 2024/01/25 5:39 PM - edited

I have a downstream build for what I think is a fix: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=58399774 (yes, the same one as for ~~RHEL-7356~~), but it may be hard to test with virtio-scsi because unplugging virtio-scsi disks currently not too rarely causes an assertion to fail and qemu to crash (cf. https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00317.html).

(Edit: Fixed brew link. Somehow I managed to apparently kill qemu in the previous build, breaking it.)

Hanna Czenczek added a comment - 2024/01/25 5:39 PM - edited I have a downstream build for what I think is a fix: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=58399774 (yes, the same one as for RHEL-7356 ), but it may be hard to test with virtio-scsi because unplugging virtio-scsi disks currently not too rarely causes an assertion to fail and qemu to crash (cf. https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00317.html ). (Edit: Fixed brew link. Somehow I managed to apparently kill qemu in the previous build, breaking it.)

Aihua Liang added a comment - 2023/12/14 11:14 AM

Reproduce it on v8.2.0-rc4, hit the qemu no response for blockdev_add issue, with gdb info:

（root@host~）#gdb -p 814870

(gdb) bt
#0 futex_wait (private=0, expected=2, futex_word=0x5600cffb67b0) at ../sysdeps/nptl/futex-internal.h:146
#1 _GI__lll_lock_wait (futex=futex@entry=0x5600cffb67b0, private=0) at lowlevellock.c:50
#2 0x00007f90e72a4d5d in lll_mutex_lock_optimized (mutex=0x5600cffb67b0) at pthread_mutex_lock.c:49
#3 ___pthread_mutex_lock (mutex=0x5600cffb67b0) at pthread_mutex_lock.c:129
#4 0x00005600ccf264bb in qemu_mutex_lock_impl (mutex=0x5600cffb67b0, file=0x5600cd1f3779 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
#5 0x00005600ccf266eb in qemu_rec_mutex_lock_impl (mutex=0x5600cffb67b0, file=0x5600cd1f3779 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
#6 0x00005600ccf407d1 in aio_context_acquire (ctx=0x5600cffb6750) at ../util/async.c:728
#7 0x00005600ccdc3537 in bdrv_drain_all_end () at ../block/io.c:593
#8 0x00005600ccdc0b62 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:157
#9 0x00005600ccd869b2 in bdrv_open_child
(filename=0x0, options=0x5600d09a7030, bdref_key=0x5600cd1a74e4 "file", parent=0x5600d009d690, child_class=0x5600cdd58560 <child_of_bds>, child_role=19, allow_none=false, errp=0x7ffd04e51d68) at ../block.c:3793
#10 0x00005600ccd86ab3 in bdrv_open_file_child
(filename=0x0, options=0x5600d09a7030, bdref_key=0x5600cd1a74e4 "file", parent=0x5600d009d690, errp=0x7ffd04e51d68) at ../block.c:3822
#11 0x00005600ccde26ab in qcow2_open (bs=0x5600d009d690, options=0x5600d09a7030, flags=8226, errp=0x7ffd04e51d68) at ../block/qcow2.c:1943
#12 0x00005600ccd8208c in bdrv_open_driver
(bs=0x5600d009d690, drv=0x5600cde8e280 <bdrv_qcow2>, node_name=0x5600d1dee920 "drive_stg6", options=0x5600d09a7030, open_flags=8226, errp=0x7ffd04e51e80)
at ../block.c:1646
#13 0x00005600ccd82cb5 in bdrv_open_common (bs=0x5600d009d690, file=0x5600d0c1de30, options=0x5600d09a7030, errp=0x7ffd04e51e80) at ../block.c:1977

#14 0x00005600ccd87990 in bdrv_open_inherit
(filename=0x0, reference=0x0, options=0x5600d09a7030, flags=8194, parent=0x0, child_class=0x0, child_role=0, errp=0x7ffd04e52088) at ../block.c:4165
#15 0x00005600ccd87fc7 in bdrv_open (filename=0x0, reference=0x0, options=0x5600d09a6010, flags=0, errp=0x7ffd04e52088) at ../block.c:4272
#16 0x00005600ccd6ec44 in bds_tree_init (bs_opts=0x5600d09a6010, errp=0x7ffd04e52088) at ../blockdev.c:681
#17 0x00005600ccd75a6d in qmp_blockdev_add (options=0x7ffd04e520c0, errp=0x7ffd04e52088) at ../blockdev.c:3534
#18 0x00005600cce9129a in qmp_marshal_blockdev_add (args=0x7f8dc80091b0, ret=0x7f90e68ead98, errp=0x7f90e68ead90) at qapi/qapi-commands-block-core.c:1459
#19 0x00005600ccf15203 in do_qmp_dispatch_bh (opaque=0x7f90e68eae30) at ../qapi/qmp-dispatch.c:128
#20 0x00005600ccf3f8ab in aio_bh_call (bh=0x7f8dc8008220) at ../util/async.c:169
#21 0x00005600ccf3f9c6 in aio_bh_poll (ctx=0x5600cfd3fad0) at ../util/async.c:216
#22 0x00005600ccf22571 in aio_dispatch (ctx=0x5600cfd3fad0) at ../util/aio-posix.c:423
#23 0x00005600ccf3fe05 in aio_ctx_dispatch (source=0x5600cfd3fad0, callback=0x0, user_data=0x0) at ../util/async.c:358
#24 0x00007f90e7607f4f in g_main_dispatch (context=0x5600cfd3fee0) at ../glib/gmain.c:3364
#25 g_main_context_dispatch (context=0x5600cfd3fee0) at ../glib/gmain.c:4079
#26 0x00005600ccf41370 in glib_pollfds_poll () at ../util/main-loop.c:290
#27 0x00005600ccf413ed in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
#28 0x00005600ccf414fb in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
#29 0x00005600cca9dd53 in qemu_main_loop () at ../system/runstate.c:782
#30 0x00005600ccd2ba81 in qemu_default_main () at ../system/main.c:37
-~~Type <RET> for more, q to quit, c to continue without paging~~-
#31 0x00005600ccd2babc in main (argc=126, argv=0x7ffd04e52568) at ../system/main.c:48

After apply Stefan's patch in upstream, qemu still no response for blockdev_add, but with different gdb info:

qemu version: v8.2.0-rc4-3-g9cd37b841c

(root@host~)# gdb -p 822638

(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00005627f76b40f5 in qemu_futex_wait (f=0x7ffea68b4b18, val=4294967295) at /home/qemu/include/qemu/futex.h:29
#2 0x00005627f76b42dc in qemu_event_wait (ev=0x7ffea68b4b18) at ../util/qemu-thread-posix.c:464
#3 0x00005627f76aedef in aio_set_fd_handler_remote
(ctx=0x5627f8ff3e90, fd=77, io_read=0x0, io_write=0x0, io_poll=0x0, io_poll_ready=0x0, opaque=0x7f487be5437c) at ../util/aio-posix.c:245
#4 0x00005627f76aeed6 in aio_set_fd_handler (ctx=0x5627f8ff3e90, fd=77, io_read=0x0, io_write=0x0, io_poll=0x0, io_poll_ready=0x0, opaque=0x7f487be5437c)
at ../util/aio-posix.c:277
#5 0x00005627f76aef82 in aio_set_event_notifier (ctx=0x5627f8ff3e90, notifier=0x7f487be5437c, io_read=0x0, io_poll=0x0, io_poll_ready=0x0)
at ../util/aio-posix.c:302
#6 0x00005627f741a4f5 in virtio_queue_aio_detach_host_notifier (vq=0x7f487be54308, ctx=0x5627f8ff3e90) at ../hw/virtio/virtio.c:3583
#7 0x00005627f73b7cea in virtio_blk_drained_begin (opaque=0x5627fa21eb60) at ../hw/block/virtio-blk.c:1523
#8 0x00005627f75423dc in blk_root_drained_begin (child=0x5627f9d5df50) at ../block/block-backend.c:2820
#9 0x00005627f754f01c in bdrv_parent_drained_begin_single (c=0x5627f9d5df50) at ../block/io.c:131
#10 0x00005627f754ed8e in bdrv_parent_drained_begin (bs=0x5627f901ca90, ignore=0x0) at ../block/io.c:62
#11 0x00005627f754fc29 in bdrv_do_drained_begin (bs=0x5627f901ca90, parent=0x0, poll=false) at ../block/io.c:383
#12 0x00005627f754fdc8 in bdrv_do_drained_begin_quiesce (bs=0x5627f901ca90, parent=0x0) at ../block/io.c:405
#13 0x00005627f750de93 in bdrv_child_cb_drained_begin (child=0x5627f9014a60) at ../block.c:1205
#14 0x00005627f754f01c in bdrv_parent_drained_begin_single (c=0x5627f9014a60) at ../block/io.c:131
#15 0x00005627f754ed8e in bdrv_parent_drained_begin (bs=0x5627f9016280, ignore=0x0) at ../block/io.c:62
#16 0x00005627f754fc29 in bdrv_do_drained_begin (bs=0x5627f9016280, parent=0x0, poll=false) at ../block/io.c:383
#17 0x00005627f755029d in bdrv_drain_all_begin_nopoll () at ../block/io.c:531
#18 0x00005627f754da7b in bdrv_graph_wrlock (bs=0x5627f9be18d0) at ../block/graph-lock.c:131
#19 0x00005627f753dc74 in blk_insert_bs (blk=0x5627fa83f520, bs=0x5627f9be18d0, errp=0x7f4891ab2d90) at ../block/block-backend.c:940

#20 0x00005627f6fe44b1 in set_drive_helper (obj=0x5627f93dd880, v=
0x5627f9b58800, name=0x5627fb24fc40 "drive", opaque=0x5627f8577580 <virtio_blk_properties>, iothread=false, errp=0x7f4891ab2d90)
at ../hw/core/qdev-properties-system.c:152
#21 0x00005627f6fe4665 in set_drive (obj=0x5627f93dd880, v=
0x5627f9b58800, name=0x5627fb24fc40 "drive", opaque=0x5627f8577580 <virtio_blk_properties>, errp=0x7f4891ab2d90)
at ../hw/core/qdev-properties-system.c:194
#22 0x00005627f74b9ca8 in field_prop_set (obj=0x5627f93dd880, v=
0x5627f9b58800, name=0x5627fb24fc40 "drive", opaque=0x5627f8577580 <virtio_blk_properties>, errp=0x7f4891ab2d90) at ../hw/core/qdev-properties.c:88
#23 0x00005627f74c5a0f in object_property_set (obj=0x5627f93dd880, name=0x5627fb24fc40 "drive", v=0x5627f9b58800, errp=0x7f4891ab2d90)
at ../qom/object.c:1435
#24 0x00005627f74c89df in property_set_alias (obj=0x5627f93d54b0, v=0x5627fa341760, name=0x5627fb3e8f00 "drive", opaque=0x5627fb24fc20, errp=0x7f4891ab2d90)
at ../qom/object.c:2746
#25 0x00005627f74c5a0f in object_property_set (obj=0x5627f93d54b0, name=0x5627fb3e8f00 "drive", v=0x5627fa341760, errp=0x7f4891ab2d90)
at ../qom/object.c:1435
-~~Type <RET> for more, q to quit, c to continue without paging~~-
#26 0x00005627f74c907a in object_set_properties_from_qdict (obj=0x5627f93d54b0, qdict=0x5627faee4e20, v=0x5627fa341760, errp=0x7f4891ab2d90)
at ../qom/object_interfaces.c:55
#27 0x00005627f74c9160 in object_set_properties_from_keyval (obj=0x5627f93d54b0, qdict=0x5627faee4e20, from_json=false, errp=0x7f4891ab2d90)
at ../qom/object_interfaces.c:73
#28 0x00005627f72243d4 in qdev_device_add_from_qdict (opts=0x5627fb5120c0, from_json=false, errp=0x7f4891ab2d90) at ../system/qdev-monitor.c:713
#29 0x00005627f7224495 in qdev_device_add (opts=0x5627fa842c20, errp=0x7f4891ab2d90) at ../system/qdev-monitor.c:738
#30 0x00005627f7224aca in qmp_device_add (qdict=0x7f4574008120, ret_data=0x7f4891ab2d98, errp=0x7f4891ab2d90) at ../system/qdev-monitor.c:860

#31 0x00005627f76a2203 in do_qmp_dispatch_bh (opaque=0x7f4891ab2e30) at ../qapi/qmp-dispatch.c:128
#32 0x00005627f76ccb31 in aio_bh_call (bh=0x7f4574006d80) at ../util/async.c:169
#33 0x00005627f76ccc4c in aio_bh_poll (ctx=0x5627f8d7dad0) at ../util/async.c:216
#34 0x00005627f76af80b in aio_dispatch (ctx=0x5627f8d7dad0) at ../util/aio-posix.c:515
#35 0x00005627f76cd08b in aio_ctx_dispatch (source=0x5627f8d7dad0, callback=0x0, user_data=0x0) at ../util/async.c:358
#36 0x00007f48927ecf4f in g_main_dispatch (context=0x5627f8d7dee0) at ../glib/gmain.c:3364
#37 g_main_context_dispatch (context=0x5627f8d7dee0) at ../glib/gmain.c:4079
#38 0x00005627f76ce5d2 in glib_pollfds_poll () at ../util/main-loop.c:290
#39 0x00005627f76ce64f in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
#40 0x00005627f76ce75d in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
#41 0x00005627f722ad53 in qemu_main_loop () at ../system/runstate.c:782
#42 0x00005627f74b8a81 in qemu_default_main () at ../system/main.c:37
#43 0x00005627f74b8abc in main (argc=126, argv=0x7ffea68b5708) at ../system/main.c:48

Aihua Liang added a comment - 2023/12/14 11:14 AM Reproduce it on v8.2.0-rc4, hit the qemu no response for blockdev_add issue, with gdb info: （root@host~）#gdb -p 814870 (gdb) bt #0 futex_wait (private=0, expected=2, futex_word=0x5600cffb67b0) at ../sysdeps/nptl/futex-internal.h:146 #1 _ GI __lll_lock_wait (futex=futex@entry=0x5600cffb67b0, private=0) at lowlevellock.c:50 #2 0x00007f90e72a4d5d in lll_mutex_lock_optimized (mutex=0x5600cffb67b0) at pthread_mutex_lock.c:49 #3 ___pthread_mutex_lock (mutex=0x5600cffb67b0) at pthread_mutex_lock.c:129 #4 0x00005600ccf264bb in qemu_mutex_lock_impl (mutex=0x5600cffb67b0, file=0x5600cd1f3779 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94 #5 0x00005600ccf266eb in qemu_rec_mutex_lock_impl (mutex=0x5600cffb67b0, file=0x5600cd1f3779 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149 #6 0x00005600ccf407d1 in aio_context_acquire (ctx=0x5600cffb6750) at ../util/async.c:728 #7 0x00005600ccdc3537 in bdrv_drain_all_end () at ../block/io.c:593 #8 0x00005600ccdc0b62 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:157 #9 0x00005600ccd869b2 in bdrv_open_child (filename=0x0, options=0x5600d09a7030, bdref_key=0x5600cd1a74e4 "file", parent=0x5600d009d690, child_class=0x5600cdd58560 <child_of_bds>, child_role=19, allow_none=false, errp=0x7ffd04e51d68) at ../block.c:3793 #10 0x00005600ccd86ab3 in bdrv_open_file_child (filename=0x0, options=0x5600d09a7030, bdref_key=0x5600cd1a74e4 "file", parent=0x5600d009d690, errp=0x7ffd04e51d68) at ../block.c:3822 #11 0x00005600ccde26ab in qcow2_open (bs=0x5600d009d690, options=0x5600d09a7030, flags=8226, errp=0x7ffd04e51d68) at ../block/qcow2.c:1943 #12 0x00005600ccd8208c in bdrv_open_driver (bs=0x5600d009d690, drv=0x5600cde8e280 <bdrv_qcow2>, node_name=0x5600d1dee920 "drive_stg6", options=0x5600d09a7030, open_flags=8226, errp=0x7ffd04e51e80) at ../block.c:1646 #13 0x00005600ccd82cb5 in bdrv_open_common (bs=0x5600d009d690, file=0x5600d0c1de30, options=0x5600d09a7030, errp=0x7ffd04e51e80) at ../block.c:1977 #14 0x00005600ccd87990 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x5600d09a7030, flags=8194, parent=0x0, child_class=0x0, child_role=0, errp=0x7ffd04e52088) at ../block.c:4165 #15 0x00005600ccd87fc7 in bdrv_open (filename=0x0, reference=0x0, options=0x5600d09a6010, flags=0, errp=0x7ffd04e52088) at ../block.c:4272 #16 0x00005600ccd6ec44 in bds_tree_init (bs_opts=0x5600d09a6010, errp=0x7ffd04e52088) at ../blockdev.c:681 #17 0x00005600ccd75a6d in qmp_blockdev_add (options=0x7ffd04e520c0, errp=0x7ffd04e52088) at ../blockdev.c:3534 #18 0x00005600cce9129a in qmp_marshal_blockdev_add (args=0x7f8dc80091b0, ret=0x7f90e68ead98, errp=0x7f90e68ead90) at qapi/qapi-commands-block-core.c:1459 #19 0x00005600ccf15203 in do_qmp_dispatch_bh (opaque=0x7f90e68eae30) at ../qapi/qmp-dispatch.c:128 #20 0x00005600ccf3f8ab in aio_bh_call (bh=0x7f8dc8008220) at ../util/async.c:169 #21 0x00005600ccf3f9c6 in aio_bh_poll (ctx=0x5600cfd3fad0) at ../util/async.c:216 #22 0x00005600ccf22571 in aio_dispatch (ctx=0x5600cfd3fad0) at ../util/aio-posix.c:423 #23 0x00005600ccf3fe05 in aio_ctx_dispatch (source=0x5600cfd3fad0, callback=0x0, user_data=0x0) at ../util/async.c:358 #24 0x00007f90e7607f4f in g_main_dispatch (context=0x5600cfd3fee0) at ../glib/gmain.c:3364 #25 g_main_context_dispatch (context=0x5600cfd3fee0) at ../glib/gmain.c:4079 #26 0x00005600ccf41370 in glib_pollfds_poll () at ../util/main-loop.c:290 #27 0x00005600ccf413ed in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313 #28 0x00005600ccf414fb in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592 #29 0x00005600cca9dd53 in qemu_main_loop () at ../system/runstate.c:782 #30 0x00005600ccd2ba81 in qemu_default_main () at ../system/main.c:37 - Type <RET> for more, q to quit, c to continue without paging - #31 0x00005600ccd2babc in main (argc=126, argv=0x7ffd04e52568) at ../system/main.c:48 After apply Stefan's patch in upstream, qemu still no response for blockdev_add, but with different gdb info: qemu version: v8.2.0-rc4-3-g9cd37b841c (root@host~)# gdb -p 822638 (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00005627f76b40f5 in qemu_futex_wait (f=0x7ffea68b4b18, val=4294967295) at /home/qemu/include/qemu/futex.h:29 #2 0x00005627f76b42dc in qemu_event_wait (ev=0x7ffea68b4b18) at ../util/qemu-thread-posix.c:464 #3 0x00005627f76aedef in aio_set_fd_handler_remote (ctx=0x5627f8ff3e90, fd=77, io_read=0x0, io_write=0x0, io_poll=0x0, io_poll_ready=0x0, opaque=0x7f487be5437c) at ../util/aio-posix.c:245 #4 0x00005627f76aeed6 in aio_set_fd_handler (ctx=0x5627f8ff3e90, fd=77, io_read=0x0, io_write=0x0, io_poll=0x0, io_poll_ready=0x0, opaque=0x7f487be5437c) at ../util/aio-posix.c:277 #5 0x00005627f76aef82 in aio_set_event_notifier (ctx=0x5627f8ff3e90, notifier=0x7f487be5437c, io_read=0x0, io_poll=0x0, io_poll_ready=0x0) at ../util/aio-posix.c:302 #6 0x00005627f741a4f5 in virtio_queue_aio_detach_host_notifier (vq=0x7f487be54308, ctx=0x5627f8ff3e90) at ../hw/virtio/virtio.c:3583 #7 0x00005627f73b7cea in virtio_blk_drained_begin (opaque=0x5627fa21eb60) at ../hw/block/virtio-blk.c:1523 #8 0x00005627f75423dc in blk_root_drained_begin (child=0x5627f9d5df50) at ../block/block-backend.c:2820 #9 0x00005627f754f01c in bdrv_parent_drained_begin_single (c=0x5627f9d5df50) at ../block/io.c:131 #10 0x00005627f754ed8e in bdrv_parent_drained_begin (bs=0x5627f901ca90, ignore=0x0) at ../block/io.c:62 #11 0x00005627f754fc29 in bdrv_do_drained_begin (bs=0x5627f901ca90, parent=0x0, poll=false) at ../block/io.c:383 #12 0x00005627f754fdc8 in bdrv_do_drained_begin_quiesce (bs=0x5627f901ca90, parent=0x0) at ../block/io.c:405 #13 0x00005627f750de93 in bdrv_child_cb_drained_begin (child=0x5627f9014a60) at ../block.c:1205 #14 0x00005627f754f01c in bdrv_parent_drained_begin_single (c=0x5627f9014a60) at ../block/io.c:131 #15 0x00005627f754ed8e in bdrv_parent_drained_begin (bs=0x5627f9016280, ignore=0x0) at ../block/io.c:62 #16 0x00005627f754fc29 in bdrv_do_drained_begin (bs=0x5627f9016280, parent=0x0, poll=false) at ../block/io.c:383 #17 0x00005627f755029d in bdrv_drain_all_begin_nopoll () at ../block/io.c:531 #18 0x00005627f754da7b in bdrv_graph_wrlock (bs=0x5627f9be18d0) at ../block/graph-lock.c:131 #19 0x00005627f753dc74 in blk_insert_bs (blk=0x5627fa83f520, bs=0x5627f9be18d0, errp=0x7f4891ab2d90) at ../block/block-backend.c:940 #20 0x00005627f6fe44b1 in set_drive_helper (obj=0x5627f93dd880, v= 0x5627f9b58800, name=0x5627fb24fc40 "drive", opaque=0x5627f8577580 <virtio_blk_properties>, iothread=false, errp=0x7f4891ab2d90) at ../hw/core/qdev-properties-system.c:152 #21 0x00005627f6fe4665 in set_drive (obj=0x5627f93dd880, v= 0x5627f9b58800, name=0x5627fb24fc40 "drive", opaque=0x5627f8577580 <virtio_blk_properties>, errp=0x7f4891ab2d90) at ../hw/core/qdev-properties-system.c:194 #22 0x00005627f74b9ca8 in field_prop_set (obj=0x5627f93dd880, v= 0x5627f9b58800, name=0x5627fb24fc40 "drive", opaque=0x5627f8577580 <virtio_blk_properties>, errp=0x7f4891ab2d90) at ../hw/core/qdev-properties.c:88 #23 0x00005627f74c5a0f in object_property_set (obj=0x5627f93dd880, name=0x5627fb24fc40 "drive", v=0x5627f9b58800, errp=0x7f4891ab2d90) at ../qom/object.c:1435 #24 0x00005627f74c89df in property_set_alias (obj=0x5627f93d54b0, v=0x5627fa341760, name=0x5627fb3e8f00 "drive", opaque=0x5627fb24fc20, errp=0x7f4891ab2d90) at ../qom/object.c:2746 #25 0x00005627f74c5a0f in object_property_set (obj=0x5627f93d54b0, name=0x5627fb3e8f00 "drive", v=0x5627fa341760, errp=0x7f4891ab2d90) at ../qom/object.c:1435 - Type <RET> for more, q to quit, c to continue without paging - #26 0x00005627f74c907a in object_set_properties_from_qdict (obj=0x5627f93d54b0, qdict=0x5627faee4e20, v=0x5627fa341760, errp=0x7f4891ab2d90) at ../qom/object_interfaces.c:55 #27 0x00005627f74c9160 in object_set_properties_from_keyval (obj=0x5627f93d54b0, qdict=0x5627faee4e20, from_json=false, errp=0x7f4891ab2d90) at ../qom/object_interfaces.c:73 #28 0x00005627f72243d4 in qdev_device_add_from_qdict (opts=0x5627fb5120c0, from_json=false, errp=0x7f4891ab2d90) at ../system/qdev-monitor.c:713 #29 0x00005627f7224495 in qdev_device_add (opts=0x5627fa842c20, errp=0x7f4891ab2d90) at ../system/qdev-monitor.c:738 #30 0x00005627f7224aca in qmp_device_add (qdict=0x7f4574008120, ret_data=0x7f4891ab2d98, errp=0x7f4891ab2d90) at ../system/qdev-monitor.c:860 #31 0x00005627f76a2203 in do_qmp_dispatch_bh (opaque=0x7f4891ab2e30) at ../qapi/qmp-dispatch.c:128 #32 0x00005627f76ccb31 in aio_bh_call (bh=0x7f4574006d80) at ../util/async.c:169 #33 0x00005627f76ccc4c in aio_bh_poll (ctx=0x5627f8d7dad0) at ../util/async.c:216 #34 0x00005627f76af80b in aio_dispatch (ctx=0x5627f8d7dad0) at ../util/aio-posix.c:515 #35 0x00005627f76cd08b in aio_ctx_dispatch (source=0x5627f8d7dad0, callback=0x0, user_data=0x0) at ../util/async.c:358 #36 0x00007f48927ecf4f in g_main_dispatch (context=0x5627f8d7dee0) at ../glib/gmain.c:3364 #37 g_main_context_dispatch (context=0x5627f8d7dee0) at ../glib/gmain.c:4079 #38 0x00005627f76ce5d2 in glib_pollfds_poll () at ../util/main-loop.c:290 #39 0x00005627f76ce64f in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313 #40 0x00005627f76ce75d in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592 #41 0x00005627f722ad53 in qemu_main_loop () at ../system/runstate.c:782 #42 0x00005627f74b8a81 in qemu_default_main () at ../system/main.c:37 #43 0x00005627f74b8abc in main (argc=126, argv=0x7ffea68b5708) at ../system/main.c:48

Stefan Hajnoczi added a comment - 2023/12/13 9:35 PM

I have posted an RFC patch series that invokes io_poll_end() from the AioContext's home thread:
https://lore.kernel.org/qemu-devel/20231213211544.1601971-1-stefanha@redhat.com/T/#t

Stefan Hajnoczi added a comment - 2023/12/13 9:35 PM I have posted an RFC patch series that invokes io_poll_end() from the AioContext's home thread: https://lore.kernel.org/qemu-devel/20231213211544.1601971-1-stefanha@redhat.com/T/#t

Stefan Hajnoczi added a comment - 2023/12/12 8:29 PM

> Regardless of where we re-enable notifications, we also need to “kick” the virt queue once to process all the requests we ignored during the drained section.

The code already has a way of doing this in cases other than removing the AioHandler. When polling ends there is a final check to see if any requests snuck in:

static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
{
    VirtQueue *vq = container_of(n, VirtQueue, host_notifier);

    /* Caller polls once more after this to catch requests that race with us */
    virtio_queue_set_notification(vq, 1);
}

and in util/aio-posix.c:poll_set_started() (and also in remove_idle_poll_handlers()):

  /* Poll one last time in case ->io_poll_end() raced with the event */
  if (!started && node->io_poll(node->opaque)) {
      aio_add_poll_ready_handler(ready_list, node);
      progress = true;
  }

This ensures that nothing is left pending after ->poll_end() has been called.

I'll think about the aio_set_fd_handler() remove case and whether it can be simplified.

Stefan Hajnoczi added a comment - 2023/12/12 8:29 PM > Regardless of where we re-enable notifications, we also need to “kick” the virt queue once to process all the requests we ignored during the drained section. The code already has a way of doing this in cases other than removing the AioHandler. When polling ends there is a final check to see if any requests snuck in: static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n) { VirtQueue *vq = container_of(n, VirtQueue, host_notifier); /* Caller polls once more after this to catch requests that race with us */ virtio_queue_set_notification(vq, 1); } and in util/aio-posix.c:poll_set_started() (and also in remove_idle_poll_handlers()): /* Poll one last time in case ->io_poll_end() raced with the event */ if (!started && node->io_poll(node->opaque)) { aio_add_poll_ready_handler(ready_list, node); progress = true ; } This ensures that nothing is left pending after ->poll_end() has been called. I'll think about the aio_set_fd_handler() remove case and whether it can be simplified.

Hanna Czenczek added a comment - 2023/12/06 5:40 PM - edited

Sorry, didn’t manage to send an upstream patch before my PTO. But here’s what I plan to do when coming back:

I think it’s fine not to force-call io_poll_end() when the notifier is removed and io_poll_begin() has been called. The polling section is not being left, so it would not necessarily be correct to invoke that callback blindly. Also, it is difficult to do correctly, as explained. The only user of these callbacks is (as far as I can tell) virtio_queue_aio_attach_host_notifier() anyway, so we can just document it and work with the problem there.

While it is tempting to force-enable notifications directly when the notifier is removed (to replace the call to virtio_queue_host_notifier_aio_poll_end() that never happened), no caller probably actually wants this. The event notifiers are removed precisely so that request processing stops (as far as I can tell at least), so why would we tell the guest to send notifications? Ideally, we only enable notifications when the notifiers are attached (i.e. in virtio_queue_aio_attach_host_notifier{,_no_poll}()), because only then will anyone care. We could even disable notifications at the end of virtio_queue_aio_detach_host_notifier() because of this.

Regardless of where we re-enable notifications, we also need to “kick” the virt queue once to process all the requests we ignored during the drained section. I don’t think virtio_queue_aio_attach_host_notifier{,_no_poll}() should do this, because most callers probably don’t want that, so we should do this in {{virtio_

{scsi,blk}

_drained_end()}}. I think it’s sufficient to simply call virtio_queue_notify_vq(vq) after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because both virtio-scsi’s and virtio-blk’s .handle_output() implementations acquire the device’s context, so this should be directly callable from any context.

Hanna Czenczek added a comment - 2023/12/06 5:40 PM - edited Sorry, didn’t manage to send an upstream patch before my PTO. But here’s what I plan to do when coming back: I think it’s fine not to force-call io_poll_end() when the notifier is removed and io_poll_begin() has been called. The polling section is not being left, so it would not necessarily be correct to invoke that callback blindly. Also, it is difficult to do correctly, as explained. The only user of these callbacks is (as far as I can tell) virtio_queue_aio_attach_host_notifier() anyway, so we can just document it and work with the problem there. While it is tempting to force-enable notifications directly when the notifier is removed (to replace the call to virtio_queue_host_notifier_aio_poll_end() that never happened), no caller probably actually wants this. The event notifiers are removed precisely so that request processing stops (as far as I can tell at least), so why would we tell the guest to send notifications? Ideally, we only enable notifications when the notifiers are attached (i.e. in virtio_queue_aio_attach_host_notifier{,_no_poll}() ), because only then will anyone care. We could even disable notifications at the end of virtio_queue_aio_detach_host_notifier() because of this. Regardless of where we re-enable notifications, we also need to “kick” the virt queue once to process all the requests we ignored during the drained section. I don’t think virtio_queue_aio_attach_host_notifier{,_no_poll}() should do this, because most callers probably don’t want that, so we should do this in {{virtio_ {scsi,blk} _drained_end()}}. I think it’s sufficient to simply call virtio_queue_notify_vq(vq) after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because both virtio-scsi’s and virtio-blk’s .handle_output() implementations acquire the device’s context, so this should be directly callable from any context.

Hanna Czenczek added a comment - 2023/11/30 10:59 AM - edited

Bisecting led to commit 766aa2de0f29b657148e04599320d771c36fd126 (“virtio-scsi: implement BlockDevOps->drained_begin()”), which is the commit that introduced virtio_scsi_drained_begin(), i.e. the function that detaches the notifiers (which leads to the hang).

That probably doesn’t help much, except that it tells me that I should question that function as a whole, too.

I think there are many issues here, and maybe fixing any one of them could fix this very issue, but in the end, all of them should be fixed.
1. It seems like virtio_queue_aio_attach_host_notifier() doesn’t have the intended effect immediately (comment-23294023), and so virtio_scsi_drained_end() (also added in 766aa2de0f) doesn’t immediately continue processing the virtqueue.
2. For event notifiers, io_poll_begin() is not always followed up with io_poll_end(). Fixing this is not trivial, because it must be done in the right AioContext, but still synchronously before aio_set_event_notifier() returns, or we have a potential use-after-free.
3. There are probably use-after-free problems in that whole notifier code anyway, but they shouldn’t affect or cause this issue here.
4. When the BB is drained during reset, it does not yet belong to virtio-scsi’s AioContext. While this may be fine in theory, it is a recipe for uncovering multithreading bugs, which in practice just exist. It’s also weird to me in general that the BB is attached to the virtio-scsi device (so that blk_drain() will affect that device), but it is not moved to its context yet.

I think comment-23286597 is right in that as a quick-fix, we could work around issue #2 by manually running the io_poll_end() callback, but I think this should be done in virtio_scsi_drained_end() instead of virtio_scsi_drained_begin() – we don’t want the notifications in the drained section after all. Also, as I noted in comment-23290013, I believe we should follow this up by polling the virtqueue once, just enabling notifications won’t help. Indeed, adding virtio_queue_set_notification(vq, 1) and virtio_queue_notify_vq() (in virtio-scsi’s AioContext, i.e. potentially in a BH) to virtio_scsi_drained_end() for each queue seems to resolve the problem.

However, that is just a quick-fix. The real problem is (A) that io_poll_begin() is not necessarily followed up by io_poll_end(), and (B) even then, the notifier is removed in virtio_scsi_drained_begin(), i.e., we don’t want the virtqueue to be processed then, so that will be suppressed until virtio_scsi_drained_end(), and just having notifications enabled won’t resume processing the virtqueue, so I think even if io_poll_begin() is always followed up by io_poll_end(), virtio_scsi_drained_end() will still need to run virtio_queue_notify_vq() manually.

Hanna Czenczek added a comment - 2023/11/30 10:59 AM - edited Bisecting led to commit 766aa2de0f29b657148e04599320d771c36fd126 (“virtio-scsi: implement BlockDevOps->drained_begin()”), which is the commit that introduced virtio_scsi_drained_begin() , i.e. the function that detaches the notifiers (which leads to the hang). That probably doesn’t help much, except that it tells me that I should question that function as a whole, too. I think there are many issues here, and maybe fixing any one of them could fix this very issue, but in the end, all of them should be fixed. 1. It seems like virtio_queue_aio_attach_host_notifier() doesn’t have the intended effect immediately ( comment-23294023 ), and so virtio_scsi_drained_end() (also added in 766aa2de0f) doesn’t immediately continue processing the virtqueue. 2. For event notifiers, io_poll_begin() is not always followed up with io_poll_end() . Fixing this is not trivial, because it must be done in the right AioContext, but still synchronously before aio_set_event_notifier() returns, or we have a potential use-after-free. 3. There are probably use-after-free problems in that whole notifier code anyway, but they shouldn’t affect or cause this issue here. 4. When the BB is drained during reset, it does not yet belong to virtio-scsi’s AioContext. While this may be fine in theory, it is a recipe for uncovering multithreading bugs, which in practice just exist. It’s also weird to me in general that the BB is attached to the virtio-scsi device (so that blk_drain() will affect that device), but it is not moved to its context yet. I think comment-23286597 is right in that as a quick-fix, we could work around issue #2 by manually running the io_poll_end() callback, but I think this should be done in virtio_scsi_drained_end() instead of virtio_scsi_drained_begin() – we don’t want the notifications in the drained section after all. Also, as I noted in comment-23290013 , I believe we should follow this up by polling the virtqueue once, just enabling notifications won’t help. Indeed, adding virtio_queue_set_notification(vq, 1) and virtio_queue_notify_vq() (in virtio-scsi’s AioContext, i.e. potentially in a BH) to virtio_scsi_drained_end() for each queue seems to resolve the problem. However, that is just a quick-fix. The real problem is (A) that io_poll_begin() is not necessarily followed up by io_poll_end() , and (B) even then, the notifier is removed in virtio_scsi_drained_begin() , i.e., we don’t want the virtqueue to be processed then, so that will be suppressed until virtio_scsi_drained_end() , and just having notifications enabled won’t resume processing the virtqueue, so I think even if io_poll_begin() is always followed up by io_poll_end() , virtio_scsi_drained_end() will still need to run virtio_queue_notify_vq() manually.

Hanna Czenczek added a comment - 2023/10/17 4:26 PM

A problem that is visible only with Stefan’s suggestion applied (not because of it, but because otherwise it hangs much more often) seems to be that the virtqueue notifiers aren’t added to the poll_aio_handlers. That is, virtio_queue_aio_attach_host_notifier() is called, aio_set_fd_handler() adds the notifiers to aio_handlers, but then aio_dispatch_handler() never sees that node. As far as I understand from the code, we expect glib to call aio_ctx_dispatch() at some point to invoke aio_dispatch_handler() on the added node, which would add it to poll_aio_handlers, but that never happens.

In such a case, the hang is not resolved after ten minutes.

Hanna Czenczek added a comment - 2023/10/17 4:26 PM A problem that is visible only with Stefan’s suggestion applied (not because of it, but because otherwise it hangs much more often) seems to be that the virtqueue notifiers aren’t added to the poll_aio_handlers. That is, virtio_queue_aio_attach_host_notifier() is called, aio_set_fd_handler() adds the notifiers to aio_handlers, but then aio_dispatch_handler() never sees that node. As far as I understand from the code, we expect glib to call aio_ctx_dispatch() at some point to invoke aio_dispatch_handler() on the added node, which would add it to poll_aio_handlers, but that never happens. In such a case, the hang is not resolved after ten minutes.

Hanna Czenczek added a comment - 2023/10/17 10:00 AM - edited

Unfortunately, that doesn’t work; I think after enabling notifications, we must also poll once to see whether any new requests have come in up to that point. Doing that from the main thread feels very wrong, though.
The other thing is that because of threading, the order can get messed up. (Though I’ve only seen that when calling io_poll_end() in aio_remove_fd_handler(), not if we delay enabling notifications until after virtio_queue_aio_detach_host_notifier() is fully done, but could still be possible, I think.) aio_remove_fd_handler() (which is what’s eventually called by virtio_queue_aio_detach_host_notifier()) will not actually remove the node when it is still in use by another thread, but only put it in the deleted_aio_handlers list. So the following order is possible:

[iothread] poll_set_started(true) runs, locks the list_lock
[iothread] checks whether the node is in the deleted list, it isn’t
[main thread] aio_remove_fd_handler() sees list_lock locked, puts the node in the deleted list, returns
[main thread] aio_set_fd_handler(), aio_set_event_notifier(), virtio_queue_aio_detach_host_notifier() return
[main thread] We enable notifications
[iothread] poll_set_started() runs node‐>io_poll_start(), disabling notifications

To me, it feels important that we only run node‐>io_poll_end() when the node is actually about to be free()‐ed, not when it is simply marked as deleted. This would potentially use node‐>opaque after aio_set_fd_handler() was called to detach the notifier, which is probably wrong (it is supplied by the caller, and naïvely, I would assume that after detaching, this object can be safely discarded), but so far we haven’t cared about that either, given that aio_remove_fd_handler() seems to have made the very conscious decision not to lock the list_lock and instead just putting the node up for lazy deletion if the lock is taken.

Maybe aio_remove_fd_handler() should lock the list_lock, and the deleted_aio_handlers list should not exist. It feels very dangerous to play this game (manual implementation of lazy garbage collection) in a language like C, without using a notion of strong references, and it leads to the TOCTTOU problem described above (poll_set_started() checks whether the node is marked for deletion, but this can change before it actually calls io_poll_start()).

Still, even if we locked it, we still need to / should call io_poll() once after io_poll_end(), and we should do so in the thread of the context to which the node belongs.

Hanna Czenczek added a comment - 2023/10/17 10:00 AM - edited Unfortunately, that doesn’t work; I think after enabling notifications, we must also poll once to see whether any new requests have come in up to that point. Doing that from the main thread feels very wrong, though. The other thing is that because of threading, the order can get messed up. (Though I’ve only seen that when calling io_poll_end() in aio_remove_fd_handler(), not if we delay enabling notifications until after virtio_queue_aio_detach_host_notifier() is fully done, but could still be possible, I think.) aio_remove_fd_handler() (which is what’s eventually called by virtio_queue_aio_detach_host_notifier()) will not actually remove the node when it is still in use by another thread, but only put it in the deleted_aio_handlers list. So the following order is possible: [iothread] poll_set_started(true) runs, locks the list_lock [iothread] checks whether the node is in the deleted list, it isn’t [main thread] aio_remove_fd_handler() sees list_lock locked, puts the node in the deleted list, returns [main thread] aio_set_fd_handler(), aio_set_event_notifier(), virtio_queue_aio_detach_host_notifier() return [main thread] We enable notifications [iothread] poll_set_started() runs node‐>io_poll_start(), disabling notifications To me, it feels important that we only run node‐>io_poll_end() when the node is actually about to be free()‐ed, not when it is simply marked as deleted. This would potentially use node‐>opaque after aio_set_fd_handler() was called to detach the notifier, which is probably wrong (it is supplied by the caller, and naïvely, I would assume that after detaching, this object can be safely discarded), but so far we haven’t cared about that either, given that aio_remove_fd_handler() seems to have made the very conscious decision not to lock the list_lock and instead just putting the node up for lazy deletion if the lock is taken. Maybe aio_remove_fd_handler() should lock the list_lock, and the deleted_aio_handlers list should not exist. It feels very dangerous to play this game (manual implementation of lazy garbage collection) in a language like C, without using a notion of strong references, and it leads to the TOCTTOU problem described above (poll_set_started() checks whether the node is marked for deletion, but this can change before it actually calls io_poll_start()). Still, even if we locked it, we still need to / should call io_poll() once after io_poll_end(), and we should do so in the thread of the context to which the node belongs.

Stefan Hajnoczi added a comment - 2023/10/16 8:28 PM - edited

> 1. The BlockBackend is not in virtio-scsi’s AioContext when the drain happens, so locking the context will not stop it. I wonder whether locking the context will ensure that we don’t lock it between poll_set_started(true) and poll_set_started(false), though.

The AioContext lock is going away upstream. If current users are broken, they need to be fixed, but let's avoid using the AioContext lock for new things.

> 2. Nothing ensures that when io_poll_begin() has been called on a removed handler, something will call io_poll_end() on it. Should we do that? remove_idle_poll_handlers() seems to try. However, aio_remove_fd_handler() is called from the main thread, not virtio-scsi’s thread (is that a problem?), and also, in our case, it runs in parallel to poll_set_started(true), so if you just call io_poll_end() there, it can easily happen before poll_set_started()’s io_poll_begin().

I think you've found the root cause: aio_set_fd_handler() does not call ->io_poll_end(), leaving virtqueue notifications disabled.

Does the following hack solve the issue?
```
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 45b95ea070..72bfddeee9 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1141,6 +1141,7 @@ static void virtio_scsi_drained_begin(SCSIBus *bus)
     for (uint32_t i = 0; i < total_queues; i++) {
         VirtQueue *vq = virtio_get_queue(vdev, i);
         virtio_queue_aio_detach_host_notifier(vq, s->ctx);
+        virtio_queue_set_notification(vq, 1);
     }
 }
```

In a proper solution ->io_poll_end() needs to be called from the AioContext thread to avoid thread-safety issues. I don't know the best way to do that off the top of my head, although it will probably use either aio_notify() (which we already have in aio_set_fd_handler()) or a BH.

> 3. When virtio_queue_aio_attach_host_notifier() is called in a polling phase, the installed virtio_queue_host_notifier_aio_poll_end() handler will not be called, by design. Should it manually enable virtqueue notifications and/or check the virtqueue for new descriptors? (Same problem as in 2, virtio_queue_aio_attach_host_notifier() is run by the main thread, not virtio-scsi’s thread.)

If detach calls ->io_poll_end(), then it won't be necessary to modify the attach behavior. The reason why ->io_poll_begin()/->io_poll_end() are not called is after attach is because the virtqueue notifier's fd must first become ready once before it is added to the ctx->poll_aio_handlers list. Since the virtqueue has notifications disabled, that never happens...

Stefan Hajnoczi added a comment - 2023/10/16 8:28 PM - edited > 1. The BlockBackend is not in virtio-scsi’s AioContext when the drain happens, so locking the context will not stop it. I wonder whether locking the context will ensure that we don’t lock it between poll_set_started(true) and poll_set_started(false), though. The AioContext lock is going away upstream. If current users are broken, they need to be fixed, but let's avoid using the AioContext lock for new things. > 2. Nothing ensures that when io_poll_begin() has been called on a removed handler, something will call io_poll_end() on it. Should we do that? remove_idle_poll_handlers() seems to try. However, aio_remove_fd_handler() is called from the main thread, not virtio-scsi’s thread (is that a problem?), and also, in our case, it runs in parallel to poll_set_started(true), so if you just call io_poll_end() there, it can easily happen before poll_set_started()’s io_poll_begin(). I think you've found the root cause: aio_set_fd_handler() does not call ->io_poll_end(), leaving virtqueue notifications disabled. Does the following hack solve the issue? ``` diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 45b95ea070..72bfddeee9 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -1141,6 +1141,7 @@ static void virtio_scsi_drained_begin(SCSIBus *bus) for (uint32_t i = 0; i < total_queues; i++) { VirtQueue *vq = virtio_get_queue(vdev, i); virtio_queue_aio_detach_host_notifier(vq, s->ctx); + virtio_queue_set_notification(vq, 1); } } ``` In a proper solution ->io_poll_end() needs to be called from the AioContext thread to avoid thread-safety issues. I don't know the best way to do that off the top of my head, although it will probably use either aio_notify() (which we already have in aio_set_fd_handler()) or a BH. > 3. When virtio_queue_aio_attach_host_notifier() is called in a polling phase, the installed virtio_queue_host_notifier_aio_poll_end() handler will not be called, by design. Should it manually enable virtqueue notifications and/or check the virtqueue for new descriptors? (Same problem as in 2, virtio_queue_aio_attach_host_notifier() is run by the main thread, not virtio-scsi’s thread.) If detach calls ->io_poll_end(), then it won't be necessary to modify the attach behavior. The reason why ->io_poll_begin()/->io_poll_end() are not called is after attach is because the virtqueue notifier's fd must first become ready once before it is added to the ctx->poll_aio_handlers list. Since the virtqueue has notifications disabled, that never happens...

Hanna Czenczek added a comment - 2023/10/13 4:24 PM

What seems to happen is the following:
1. We attach the virtqueue notifiers (event virtqueue).
2. The virtio-scsi device is attached, the guest scans all LUNs, i.e. sends a lot of virtio-scsi requests.
3. While these requests happen, the scsi-hd device is attached, resetting it (scsi_disk_reset()). Note that device_set_realized() resets the device before hot-plugging it, i.e. before virtio_scsi_hotplug() would move the BlockBackend to virtio-scsi’s AioContext, leaving it in the main context throughout the following points.
4. Resetting lands in blk_drain(), which calls blk->dev_ops->drained_begin(), i.e. scsi_disk_drained_begin(), which calls virtio_scsi_drained_begin(). This function detaches all virtqueue notifiers.
5. While detaching, a request comes in, handled in the I/O thread. We enter polling mode, poll_set_started(true) calls virtio_queue_host_notifier_aio_poll_begin(), disabling notifications on the virtqueue. Then, the notifier is detached/deleted.
6. When polling mode is left via poll_set_started(false), the notifier has often been already set up again (by the main thread), but it seems to me like the polling code is set up so it will not call io_poll_end() on something where it hasn’t called io_poll_begin() before. (The notifier has been removed and re-added, so it’s technically a different one.) Therefore, it will not call virtio_queue_host_notifier_aio_poll_end() and the virtqueue notifications remain disabled.

I haven’t waited the full ten minutes, but I suspect some event comes in somewhere the results in virtio_queue_host_notifier_aio_poll_begin()+end() being called, re-enabling notifications, resuming processing of the virtqueue.

shajnocz@redhat.com: From my perspective, there are multiple angles here. I need your input on what you think is actually a problem and should be fixed (maybe none of them, and you have a completely different idea ):
1. The BlockBackend is not in virtio-scsi’s AioContext when the drain happens, so locking the context will not stop it. I wonder whether locking the context will ensure that we don’t lock it between poll_set_started(true) and poll_set_started(false), though.
2. Nothing ensures that when io_poll_begin() has been called on a removed handler, something will call io_poll_end() on it. Should we do that? remove_idle_poll_handlers() seems to try. However, aio_remove_fd_handler() is called from the main thread, not virtio-scsi’s thread (is that a problem?), and also, in our case, it runs in parallel to poll_set_started(true), so if you just call io_poll_end() there, it can easily happen before poll_set_started()’s io_poll_begin().
3. When virtio_queue_aio_attach_host_notifier() is called in a polling phase, the installed virtio_queue_host_notifier_aio_poll_end() handler will not be called, by design. Should it manually enable virtqueue notifications and/or check the virtqueue for new descriptors? (Same problem as in 2, virtio_queue_aio_attach_host_notifier() is run by the main thread, not virtio-scsi’s thread.)

Hanna Czenczek added a comment - 2023/10/13 4:24 PM What seems to happen is the following: 1. We attach the virtqueue notifiers (event virtqueue). 2. The virtio-scsi device is attached, the guest scans all LUNs, i.e. sends a lot of virtio-scsi requests. 3. While these requests happen, the scsi-hd device is attached, resetting it (scsi_disk_reset()). Note that device_set_realized() resets the device before hot-plugging it, i.e. before virtio_scsi_hotplug() would move the BlockBackend to virtio-scsi’s AioContext, leaving it in the main context throughout the following points. 4. Resetting lands in blk_drain(), which calls blk->dev_ops->drained_begin(), i.e. scsi_disk_drained_begin(), which calls virtio_scsi_drained_begin(). This function detaches all virtqueue notifiers. 5. While detaching, a request comes in, handled in the I/O thread. We enter polling mode, poll_set_started(true) calls virtio_queue_host_notifier_aio_poll_begin(), disabling notifications on the virtqueue. Then, the notifier is detached/deleted. 6. When polling mode is left via poll_set_started(false), the notifier has often been already set up again (by the main thread), but it seems to me like the polling code is set up so it will not call io_poll_end() on something where it hasn’t called io_poll_begin() before. (The notifier has been removed and re-added, so it’s technically a different one.) Therefore, it will not call virtio_queue_host_notifier_aio_poll_end() and the virtqueue notifications remain disabled. I haven’t waited the full ten minutes, but I suspect some event comes in somewhere the results in virtio_queue_host_notifier_aio_poll_begin()+end() being called, re-enabling notifications, resuming processing of the virtqueue. shajnocz@redhat.com : From my perspective, there are multiple angles here. I need your input on what you think is actually a problem and should be fixed (maybe none of them, and you have a completely different idea ): 1. The BlockBackend is not in virtio-scsi’s AioContext when the drain happens, so locking the context will not stop it. I wonder whether locking the context will ensure that we don’t lock it between poll_set_started(true) and poll_set_started(false), though. 2. Nothing ensures that when io_poll_begin() has been called on a removed handler, something will call io_poll_end() on it. Should we do that? remove_idle_poll_handlers() seems to try. However, aio_remove_fd_handler() is called from the main thread, not virtio-scsi’s thread (is that a problem?), and also, in our case, it runs in parallel to poll_set_started(true), so if you just call io_poll_end() there, it can easily happen before poll_set_started()’s io_poll_begin(). 3. When virtio_queue_aio_attach_host_notifier() is called in a polling phase, the installed virtio_queue_host_notifier_aio_poll_end() handler will not be called, by design. Should it manually enable virtqueue notifications and/or check the virtqueue for new descriptors? (Same problem as in 2, virtio_queue_aio_attach_host_notifier() is run by the main thread, not virtio-scsi’s thread.)

Hanna Czenczek added a comment - 2023/10/10 9:52 AM

Again, there is no crash in the log, only stack traces for processes that hang in the kernel.

Hanna Czenczek added a comment - 2023/10/10 9:52 AM Again, there is no crash in the log, only stack traces for processes that hang in the kernel.

Hanna Czenczek added a comment - 2023/10/06 4:19 PM - edited

Yup, this has helped a great deal. I found that when adding the virtio-scsi device, Linux is enumerating all LUNs. In that time, we add the scsi-hd device as LUN 0, so because the enumeration is still ongoing, it won’t be found immediately. But more importantly, after the scsi-hd device is being added, something disables notifications, so the virt queue will no longer be polled, and the guest can’t find the device. (Polling resumes after those ten minutes.)

When the scsi-hd is added, it is reset. During the reset, because we are using an iothread*, requests to the virtio-scsi device can still be processed. I think the reset can interfere with polling. When poll_set_started(true) is called, it is in the ready queue, so virtio_queue_host_notifier_aio_poll_begin() is called and disables notifications, but it is not in poll_set_started(false)’s ready queue, so virtio_queue_host_notifier_aio_poll_end() is never called (or only after ten minutes).

As far as I can tell, this node (which belongs to the virtio-scsi device) enters the ready list only once the scsi-hd device is added. So I think the problem comes from the fact that the scsi-hd device is hot-plugged concurrently to polling happening on the virtqueue (the virtio-scsi device). Furthermore, we do take some effort to block concurrent processing, but only on the block level (here we have virtqueue processing independently of any connected block devices, because it’s device discovery), and we only move the BlockBackend to the target context in virtio_scsi_hotplug() – which is called after the reset, and the reset is when the problem occurs. So acquiring scsi-hd’s AioContext will not stop virtqueue processing (before virtio_scsi_hotplug()).

I don’t have much experience with the poll code. I’ll need to find out where exactly the virtio-scsi node enters the ready list, and where it is removed again. (It is entered somewhere around the time of the reset, allowing virtio_queue_host_notifier_aio_poll_begin() to be called, and then it’s removed before virtio_queue_host_notifier_aio_poll_end() can be called.)

EDIT: When I talk about the ready list / ready queue above, I meant the ctx->poll_aio_handlers list.

*So the original report is correct and my memory was wrong: This only happens when using an iothread.

Hanna Czenczek added a comment - 2023/10/06 4:19 PM - edited Yup, this has helped a great deal. I found that when adding the virtio-scsi device, Linux is enumerating all LUNs. In that time, we add the scsi-hd device as LUN 0, so because the enumeration is still ongoing, it won’t be found immediately. But more importantly, after the scsi-hd device is being added, something disables notifications, so the virt queue will no longer be polled, and the guest can’t find the device. (Polling resumes after those ten minutes.) When the scsi-hd is added, it is reset. During the reset, because we are using an iothread*, requests to the virtio-scsi device can still be processed. I think the reset can interfere with polling. When poll_set_started(true) is called, it is in the ready queue, so virtio_queue_host_notifier_aio_poll_begin() is called and disables notifications, but it is not in poll_set_started(false)’s ready queue, so virtio_queue_host_notifier_aio_poll_end() is never called (or only after ten minutes). As far as I can tell, this node (which belongs to the virtio-scsi device) enters the ready list only once the scsi-hd device is added. So I think the problem comes from the fact that the scsi-hd device is hot-plugged concurrently to polling happening on the virtqueue (the virtio-scsi device). Furthermore, we do take some effort to block concurrent processing, but only on the block level (here we have virtqueue processing independently of any connected block devices, because it’s device discovery), and we only move the BlockBackend to the target context in virtio_scsi_hotplug() – which is called after the reset, and the reset is when the problem occurs. So acquiring scsi-hd’s AioContext will not stop virtqueue processing (before virtio_scsi_hotplug()). I don’t have much experience with the poll code. I’ll need to find out where exactly the virtio-scsi node enters the ready list, and where it is removed again. (It is entered somewhere around the time of the reset, allowing virtio_queue_host_notifier_aio_poll_begin() to be called, and then it’s removed before virtio_queue_host_notifier_aio_poll_end() can be called.) EDIT: When I talk about the ready list / ready queue above, I meant the ctx->poll_aio_handlers list. *So the original report is correct and my memory was wrong: This only happens when using an iothread.

Hanna Czenczek added a comment - 2023/10/06 8:43 AM

Thanks! I found this task:

Oct 06 08:37:28 archlinux kernel: task:kworker/u2:5    state:D stack:0     pid:69    ppid:2      flags:0x00004000
Oct 06 08:37:28 archlinux kernel: Workqueue: events_unbound async_run_entry_fn
Oct 06 08:37:28 archlinux kernel: Call Trace:
Oct 06 08:37:28 archlinux kernel:  <TASK>
Oct 06 08:37:28 archlinux kernel:  __schedule+0x370/0x12a0
Oct 06 08:37:28 archlinux kernel:  ? __mod_timer+0x289/0x3b0
Oct 06 08:37:28 archlinux kernel:  schedule+0x5e/0xd0
Oct 06 08:37:28 archlinux kernel:  schedule_timeout+0x98/0x150
Oct 06 08:37:28 archlinux kernel:  ? __bpf_trace_tick_stop+0x10/0x10
Oct 06 08:37:28 archlinux kernel:  io_schedule_timeout+0x50/0x80
Oct 06 08:37:28 archlinux kernel:  wait_for_completion_io_timeout+0x83/0x170
Oct 06 08:37:28 archlinux kernel:  blk_execute_rq+0x134/0x1d0
Oct 06 08:37:28 archlinux kernel:  __scsi_execute+0x10b/0x2c0
Oct 06 08:37:28 archlinux kernel:  scsi_probe_and_add_lun+0x19c/0xe00
Oct 06 08:37:28 archlinux kernel:  __scsi_scan_target+0xf4/0x590
Oct 06 08:37:28 archlinux kernel:  scsi_scan_channel+0x59/0x90
Oct 06 08:37:28 archlinux kernel:  scsi_scan_host_selected+0xe3/0x120
Oct 06 08:37:28 archlinux kernel:  do_scan_async+0x1b/0x160
Oct 06 08:37:28 archlinux kernel:  async_run_entry_fn+0x34/0x130
Oct 06 08:37:28 archlinux kernel:  process_one_work+0x1c7/0x380
Oct 06 08:37:28 archlinux kernel:  worker_thread+0x51/0x390
Oct 06 08:37:28 archlinux kernel:  ? rescuer_thread+0x3b0/0x3b0
Oct 06 08:37:28 archlinux kernel:  kthread+0xde/0x110
Oct 06 08:37:28 archlinux kernel:  ? kthread_complete_and_exit+0x20/0x20
Oct 06 08:37:28 archlinux kernel:  ret_from_fork+0x22/0x30
Oct 06 08:37:28 archlinux kernel:  </TASK>

So it looks like it’s waiting for some completion from qemu after all. I’ll continue to look there.

Hanna Czenczek added a comment - 2023/10/06 8:43 AM Thanks! I found this task: Oct 06 08:37:28 archlinux kernel: task:kworker/u2:5 state:D stack:0 pid:69 ppid:2 flags:0x00004000 Oct 06 08:37:28 archlinux kernel: Workqueue: events_unbound async_run_entry_fn Oct 06 08:37:28 archlinux kernel: Call Trace: Oct 06 08:37:28 archlinux kernel: <TASK> Oct 06 08:37:28 archlinux kernel: __schedule+0x370/0x12a0 Oct 06 08:37:28 archlinux kernel: ? __mod_timer+0x289/0x3b0 Oct 06 08:37:28 archlinux kernel: schedule+0x5e/0xd0 Oct 06 08:37:28 archlinux kernel: schedule_timeout+0x98/0x150 Oct 06 08:37:28 archlinux kernel: ? __bpf_trace_tick_stop+0x10/0x10 Oct 06 08:37:28 archlinux kernel: io_schedule_timeout+0x50/0x80 Oct 06 08:37:28 archlinux kernel: wait_for_completion_io_timeout+0x83/0x170 Oct 06 08:37:28 archlinux kernel: blk_execute_rq+0x134/0x1d0 Oct 06 08:37:28 archlinux kernel: __scsi_execute+0x10b/0x2c0 Oct 06 08:37:28 archlinux kernel: scsi_probe_and_add_lun+0x19c/0xe00 Oct 06 08:37:28 archlinux kernel: __scsi_scan_target+0xf4/0x590 Oct 06 08:37:28 archlinux kernel: scsi_scan_channel+0x59/0x90 Oct 06 08:37:28 archlinux kernel: scsi_scan_host_selected+0xe3/0x120 Oct 06 08:37:28 archlinux kernel: do_scan_async+0x1b/0x160 Oct 06 08:37:28 archlinux kernel: async_run_entry_fn+0x34/0x130 Oct 06 08:37:28 archlinux kernel: process_one_work+0x1c7/0x380 Oct 06 08:37:28 archlinux kernel: worker_thread+0x51/0x390 Oct 06 08:37:28 archlinux kernel: ? rescuer_thread+0x3b0/0x3b0 Oct 06 08:37:28 archlinux kernel: kthread+0xde/0x110 Oct 06 08:37:28 archlinux kernel: ? kthread_complete_and_exit+0x20/0x20 Oct 06 08:37:28 archlinux kernel: ret_from_fork+0x22/0x30 Oct 06 08:37:28 archlinux kernel: </TASK> So it looks like it’s waiting for some completion from qemu after all. I’ll continue to look there.

Ewan Milne added a comment - 2023/10/05 5:28 PM

A thread dump (as mentioned above) during the 10 minutes when the hung task timeout

shows tasks waiting for the scan_mutex would be helpful. Or a crash dump if you can get one.

I have no idea why this would resolve itself after 10 minutes unless there are some timeouts

involved with I/O for the scan. It would be coincidence if this added up to exactly 10 minutes though.

Ewan Milne added a comment - 2023/10/05 5:28 PM A thread dump (as mentioned above) during the 10 minutes when the hung task timeout shows tasks waiting for the scan_mutex would be helpful. Or a crash dump if you can get one. I have no idea why this would resolve itself after 10 minutes unless there are some timeouts involved with I/O for the scan. It would be coincidence if this added up to exactly 10 minutes though.

Stefan Hajnoczi added a comment - 2023/10/05 5:02 PM

Looks like a deadlock on `shost->scan_mutex`. Can you dump all tasks (SysRq + t) inside the guest to see which of them has a backtrace associated with SCSI scanning?

Stefan Hajnoczi added a comment - 2023/10/05 5:02 PM Looks like a deadlock on `shost->scan_mutex`. Can you dump all tasks (SysRq + t) inside the guest to see which of them has a backtrace associated with SCSI scanning?

Hanna Czenczek added a comment - 2023/10/05 2:42 PM

Hi Ewan (I hope adding you to the watch list works just like adding someone to Cc in Bugzilla) – Vivek said you might have some insights into this issue. What I’m seeing is that when repeatedly hot-plugging and hot-unplugging a virtio-scsi device to/from a qemu VM, after several 100 iterations or so (varies greatly), a process hangs inside the guest kernel with the following stack trace (on a hot-plug):

[  368.281608] INFO: task kworker/0:4:263 blocked for more than 122 seconds.
[  368.281633]       Not tainted 6.1.1-arch1-1 #1
[  368.281639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  368.281648] task:kworker/0:4     state:D stack:0     pid:263   ppid:2      flags:0x00004000
[  368.281652] Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
[  368.281662] Call Trace:
[  368.281663]  <TASK>
[  368.281666]  __schedule+0x370/0x12a0
[  368.281679]  schedule+0x5e/0xd0
[  368.281681]  schedule_preempt_disabled+0x15/0x30
[  368.281682]  __mutex_lock.constprop.0+0x399/0x700
[  368.281685]  scsi_scan_target+0x73/0x100
[  368.281695]  virtscsi_handle_event+0x25a/0x2b0 [virtio_scsi c1956a4901c24f4203502fdae284216ec504ddab]
[  368.281699]  process_one_work+0x1c7/0x380
[  368.281709]  worker_thread+0x51/0x390
[  368.281711]  ? rescuer_thread+0x3b0/0x3b0
[  368.281713]  kthread+0xde/0x110
[  368.281714]  ? kthread_complete_and_exit+0x20/0x20
[  368.281716]  ret_from_fork+0x22/0x30
[  368.281722]  </TASK>

It hangs for pretty much exactly ten minutes (at least exact to the second), then it seems to resolve: While it hangs, the hot-plugged device doesn‘t appear in the guest, but it does appear then, and there are no more such “blocked” messages.

Do you know what may cause this? Is there some lingering resource that holds the lock (e.g. from a previous hot-plug/hot-unplug cycle), which for some reason is released e.g. because of a timeout of ten minutes?

(If the device is attempted to be hot-unplugged during the hang, the hang persists beyond those ten minutes.)

Hanna Czenczek added a comment - 2023/10/05 2:42 PM Hi Ewan (I hope adding you to the watch list works just like adding someone to Cc in Bugzilla) – Vivek said you might have some insights into this issue. What I’m seeing is that when repeatedly hot-plugging and hot-unplugging a virtio-scsi device to/from a qemu VM, after several 100 iterations or so (varies greatly), a process hangs inside the guest kernel with the following stack trace (on a hot-plug): [ 368.281608] INFO: task kworker/0:4:263 blocked for more than 122 seconds. [ 368.281633] Not tainted 6.1.1-arch1-1 #1 [ 368.281639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 368.281648] task:kworker/0:4 state:D stack:0 pid:263 ppid:2 flags:0x00004000 [ 368.281652] Workqueue: events_freezable virtscsi_handle_event [virtio_scsi] [ 368.281662] Call Trace: [ 368.281663] <TASK> [ 368.281666] __schedule+0x370/0x12a0 [ 368.281679] schedule+0x5e/0xd0 [ 368.281681] schedule_preempt_disabled+0x15/0x30 [ 368.281682] __mutex_lock.constprop.0+0x399/0x700 [ 368.281685] scsi_scan_target+0x73/0x100 [ 368.281695] virtscsi_handle_event+0x25a/0x2b0 [virtio_scsi c1956a4901c24f4203502fdae284216ec504ddab] [ 368.281699] process_one_work+0x1c7/0x380 [ 368.281709] worker_thread+0x51/0x390 [ 368.281711] ? rescuer_thread+0x3b0/0x3b0 [ 368.281713] kthread+0xde/0x110 [ 368.281714] ? kthread_complete_and_exit+0x20/0x20 [ 368.281716] ret_from_fork+0x22/0x30 [ 368.281722] </TASK> It hangs for pretty much exactly ten minutes (at least exact to the second), then it seems to resolve: While it hangs, the hot-plugged device doesn‘t appear in the guest, but it does appear then, and there are no more such “blocked” messages. Do you know what may cause this? Is there some lingering resource that holds the lock (e.g. from a previous hot-plug/hot-unplug cycle), which for some reason is released e.g. because of a timeout of ten minutes? (If the device is attempted to be hot-unplugged during the hang, the hang persists beyond those ten minutes.)

Hanna Czenczek added a comment - 2023/10/05 9:31 AM - edited

Here’s my current understanding:

First, I’m lowering the priority of this bug. If it happens only sometimes with a reproducer that runs 1000 iterations (comment-23162287), I don’t think it’s critical.

I cannot reproduce a hang on qemu’s side (i.e. that the QMP monitor would hang). The reproducer given in comment-23162287 is not a well-behaving QMP client (does not read responses, does not await DEVICE_DELETED events), so it might be that.

I cannot reproduce a problem with virtio-blk or with scsi-hd disks.

I cannot reproduce a crash in the guest, and I also didn’t find a crash in the provided logs. I see stack traces, but these have not been generated due to crashes, but because a process hangs.

What I can reproduce is a hang in the guest kernel when repeatedly hot(un)plugging virtio-scsi devices. The hang occurs in __mutex_lock.constprop(). This is also what I see in the provided logs, where it’s multiple processes hanging there. If you await the device appearing in the guest, it will appear after ten minutes. If you try to hotunplug it before then, the removing process in the kernel will also hang in __mutex_lock.constprop(), which apparently leads to a deadlock that will not resolve.

(Edit: I want to stress that what I can reproduce is exactly what I see in the provided logs, i.e. processes hanging in __mutex_lock.constprop(). If there is data for a qemu hang (e.g. a stack trace of the hung process) or stack traces for virtio-blk, I really need that data, because I can’t reproduce those problems myself.)

I think I was also able to reproduce the bug without using an iothread, but I’m not sure, because right now, I fail to reproduce it altogether. I will come back once I can say for sure.

Hanna Czenczek added a comment - 2023/10/05 9:31 AM - edited Here’s my current understanding: First, I’m lowering the priority of this bug. If it happens only sometimes with a reproducer that runs 1000 iterations ( comment-23162287 ), I don’t think it’s critical. I cannot reproduce a hang on qemu’s side (i.e. that the QMP monitor would hang). The reproducer given in comment-23162287 is not a well-behaving QMP client (does not read responses, does not await DEVICE_DELETED events), so it might be that. I cannot reproduce a problem with virtio-blk or with scsi-hd disks. I cannot reproduce a crash in the guest, and I also didn’t find a crash in the provided logs. I see stack traces, but these have not been generated due to crashes, but because a process hangs. What I can reproduce is a hang in the guest kernel when repeatedly hot(un)plugging virtio-scsi devices. The hang occurs in __mutex_lock.constprop(). This is also what I see in the provided logs, where it’s multiple processes hanging there. If you await the device appearing in the guest, it will appear after ten minutes. If you try to hotunplug it before then, the removing process in the kernel will also hang in __mutex_lock.constprop(), which apparently leads to a deadlock that will not resolve. (Edit: I want to stress that what I can reproduce is exactly what I see in the provided logs, i.e. processes hanging in __mutex_lock.constprop(). If there is data for a qemu hang (e.g. a stack trace of the hung process) or stack traces for virtio-blk, I really need that data, because I can’t reproduce those problems myself.) I think I was also able to reproduce the bug without using an iothread, but I’m not sure, because right now, I fail to reproduce it altogether. I will come back once I can say for sure.

Hanna Czenczek added a comment - 2023/09/29 4:40 PM

So when I try to delete the device when the kernel hasn’t recognized it, I get that message (“virtio_scsi: SCSI device 6 0 0 0 not found”) and device_del (for the virtio-scsi device) doesn’t get a DEVICE_DELETED event. While this is kind of fair (device_del is just a request to the guest, so if the guest doesn’t know the device yet, nothing will be unplugged), it is still strange that a worker thread hangs even in this case, namely in acpiphp_disable_and_eject_slot() → pci_device_remove() → virtio_dev_remove() → virtscsi_remove(). There’s another thread hanging in virtscsi_handle_event() → scsi_scan_target() → __mutex_lock.constprop, which is the one that’s just hanging in general, causing the device not to appear.

Does the original test (comment-23030859) always wait for DEVICE_DELETED events?

While I’ve generally only seen one or two processes hang, in the log mentioned in comment-23030876, it’s multiple: One hangs from scsi_add_device(), several hang from I/O, and one from scsi_remove_device(). It’s interesting that both scsi_add_device() and scsi_remove_device() hang at the same time, and both in __mutex_lock.constprop. I think this doesn’t look like a bug in qemu, because the guest seems to get the correct events. Perhaps there’s a deadlock in the guest kernel?

Hanna Czenczek added a comment - 2023/09/29 4:40 PM So when I try to delete the device when the kernel hasn’t recognized it, I get that message (“virtio_scsi: SCSI device 6 0 0 0 not found”) and device_del (for the virtio-scsi device) doesn’t get a DEVICE_DELETED event. While this is kind of fair (device_del is just a request to the guest, so if the guest doesn’t know the device yet, nothing will be unplugged), it is still strange that a worker thread hangs even in this case, namely in acpiphp_disable_and_eject_slot() → pci_device_remove() → virtio_dev_remove() → virtscsi_remove(). There’s another thread hanging in virtscsi_handle_event() → scsi_scan_target() → __mutex_lock.constprop, which is the one that’s just hanging in general, causing the device not to appear. Does the original test ( comment-23030859 ) always wait for DEVICE_DELETED events? While I’ve generally only seen one or two processes hang, in the log mentioned in comment-23030876 , it’s multiple: One hangs from scsi_add_device(), several hang from I/O, and one from scsi_remove_device(). It’s interesting that both scsi_add_device() and scsi_remove_device() hang at the same time, and both in __mutex_lock.constprop. I think this doesn’t look like a bug in qemu, because the guest seems to get the correct events. Perhaps there’s a deadlock in the guest kernel?

Hanna Czenczek added a comment - 2023/09/29 2:59 PM

Two more notes:

When trying to delete the device while it has not yet appeared in the guest, the guest shows a message that it doesn’t know this device. Then, the device won’t appear even after ten minutes.

Second, the second link in comment-23030876 doesn’t show anything crashing in the guest, as far as I can see. It shows call traces that are made for kworker threads that are hanging (that haven’t crashed), which is what I have observed, too. (However, left alone, something seems to resolve the situation after ten minutes.)

Hanna Czenczek added a comment - 2023/09/29 2:59 PM Two more notes: When trying to delete the device while it has not yet appeared in the guest, the guest shows a message that it doesn’t know this device. Then, the device won’t appear even after ten minutes. Second, the second link in comment-23030876 doesn’t show anything crashing in the guest, as far as I can see. It shows call traces that are made for kworker threads that are hanging (that haven’t crashed), which is what I have observed, too. (However, left alone, something seems to resolve the situation after ten minutes.)

Hanna Czenczek added a comment - 2023/09/29 2:01 PM

I can reproduce part of the original problem, i.e. when hot-(un)plugging the whole virtio-scsi device, the block devices, and the scsi-hd device, the device sometimes doesn’t appear in the guest. What happens is that something in the guest hangs (some kworker messages appear), and after always exactly ten minutes, the device suddenly does appear and everything is normal then. I don’t know yet what happens. After those ten minutes, you can even hot-unplug the device, and do many more cycles until the problem re-appears and exactly the same thing happens again (ten minutes of nothing, then the device appears). I’ll test what happens if you unplug the device in that time.

I can’t reproduce anything using the second approach (comment-23162287), it just keeps working (admittedly I haven’t tested a RHEL guest yet, though). One thing to note about the script is it doesn’t read the QMP output, which includes responses and events. This may cause problems (compare BZ 2100096).

Hanna Czenczek added a comment - 2023/09/29 2:01 PM I can reproduce part of the original problem, i.e. when hot-(un)plugging the whole virtio-scsi device, the block devices, and the scsi-hd device, the device sometimes doesn’t appear in the guest. What happens is that something in the guest hangs (some kworker messages appear), and after always exactly ten minutes, the device suddenly does appear and everything is normal then. I don’t know yet what happens. After those ten minutes, you can even hot-unplug the device, and do many more cycles until the problem re-appears and exactly the same thing happens again (ten minutes of nothing, then the device appears). I’ll test what happens if you unplug the device in that time. I can’t reproduce anything using the second approach ( comment-23162287 ), it just keeps working (admittedly I haven’t tested a RHEL guest yet, though). One thing to note about the script is it doesn’t read the QMP output, which includes responses and events. This may cause problems (compare BZ 2100096 ).

qing wang added a comment - 2023/09/28 8:30 AM

reference producer (not always work)

1.boot vm
#!/bin/bash
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine pc,memory-backend=mem-machine_mem \
-nodefaults \
-device '\{"driver": "VGA", "bus": "pci.0", "addr": "0x2"}' \
-m 12288 \
-object '\{"size": 12884901888, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' \
-smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \
-cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,fb-clear=on,hle=off,rtm=off,kvm_pv_unhalt=on \
\
-device '\{"driver": "ich9-usb-ehci1", "id": "usb1", "addr": "0x1d.0x7", "multifunction": true, "bus": "pci.0"}' \
-device '\{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
-object '\{"qom-type": "iothread", "id": "iothread0"}' \
-object '\{"qom-type": "iothread", "id": "iothread1"}' \
-device '\{"id": "scsi0", "driver": "virtio-scsi-pci", "iothread": "iothread0","bus": "pci.0", "addr": "0x5"}' \
-device '\{"id": "scsi1", "driver": "virtio-scsi-pci", "iothread": "iothread1","bus": "pci.0", "addr": "0x6"}' \
-blockdev '\{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '\{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
-device '\{"driver": "virtio-blk-pci", "id": "image1", "drive": "drive_image1", "bootindex": 0, "write-cache": "on", "bus": "pci.0", "addr": "0x3", "iothread": "iothread0"}' \
\
-blockdev '\{"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg0.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '\{"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}' \
-blockdev '\{"node-name": "file_stg1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg1.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '\{"node-name": "drive_stg1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg1"}' \
-blockdev '\{"node-name": "file_stg2", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg2.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '\{"node-name": "drive_stg2", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg2"}' \
-blockdev '\{"node-name": "file_stg3", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg3.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '\{"node-name": "drive_stg3", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg3"}' \
\
-device '\{"driver": "virtio-net-pci", "mac": "9a:bb:7b:d3:cb:5d", "id": "idKG4PKp", "netdev": "id7VQKjJ", "bus": "pci.0", "addr": "0x4"}' \
-netdev tap,id=id7VQKjJ,vhost=on \
-vnc :5 \
-monitor stdio \
-qmp tcp:0:5955,server=on,wait=off \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-chardev socket,id=socket-serial,path=/var/tmp/socket-serial,logfile=/var/tmp/file-serial.log,mux=on,server=on,wait=off \
-serial chardev:socket-serial \
-chardev file,path=/var/tmp/file-bios.log,id=file-bios \
-device isa-debugcon,chardev=file-bios,iobase=0x402 \
-chardev socket,id=socket-qmp,path=/var/tmp/socket-qmp,logfile=/var/tmp/file-qmp.log,mux=on,server=on,wait=off \
-mon chardev=socket-qmp,mode=control \
-chardev socket,id=socket-hmp,path=/var/tmp/socket-hmp,logfile=/var/tmp/file-hmp.log,mux=on,server=on,wait=off \
-mon chardev=socket-hmp,mode=readline \
 

 

2. run plug/unplug script

./run_plug.sh add_disks.txt del_disks.txt

 

 

cat  add_disks.txt
{"execute": "qmp_capabilities" }
{"execute": "device_add", "arguments": \{"driver": "scsi-hd", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "bus": "scsi0.0"}}
{"execute": "device_add", "arguments": \{"driver": "scsi-hd", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "scsi1.0"}}
{"execute": "device_add", "arguments": \{"driver": "virtio-blk-pci", "id": "stg2", "drive": "drive_stg2", "write-cache": "on", "bus": "pci.0", "addr": "0x7", "iothread": "iothread0"}}
{"execute": "device_add", "arguments": \{"driver": "virtio-blk-pci", "id": "stg3", "drive": "drive_stg3", "write-cache": "on", "bus": "pci.0", "addr": "0x8", "iothread": "iothread1"}}
 
cat del_disks.txt
{ "execute": "qmp_capabilities" }
{ "execute": "device_del", "arguments": \{"id": "stg0"} }
{ "execute": "device_del", "arguments": \{"id": "stg1"} }
{ "execute": "device_del", "arguments": \{"id": "stg2"} }
{ "execute": "device_del", "arguments": \{"id": "stg3"} }
 
cat run_plug.sh
 
plug_script=$1
unplug_script=$2

for ((y=0;y<20;y++)) ; do
for ((x=0;x<50;x++)); do
echo "$y:$x"
nc -U /var/tmp/socket-qmp < ${plug_script} ;
sleep 0.5
nc -U /var/tmp/socket-qmp < ${unplug_script};
sleep 0.5
done
sleep 5
done
 
3. login guest (optional)
run lsblk to check disks (random run)

qing wang added a comment - 2023/09/28 8:30 AM reference producer (not always work) 1.boot vm #!/bin/bash /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pc,memory-backend=mem-machine_mem \ -nodefaults \ -device '\{"driver": "VGA", "bus": "pci.0", "addr": "0x2"}' \ -m 12288 \ -object '\{"size": 12884901888, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' \ -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \ -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,fb-clear=on,hle=off,rtm=off,kvm_pv_unhalt=on \ \ -device '\{"driver": "ich9-usb-ehci1", "id": "usb1", "addr": "0x1d.0x7", "multifunction": true, "bus": "pci.0"}' \ -device '\{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \ -object '\{"qom-type": "iothread", "id": "iothread0"}' \ -object '\{"qom-type": "iothread", "id": "iothread1"}' \ -device '\{"id": "scsi0", "driver": "virtio-scsi-pci", "iothread": "iothread0","bus": "pci.0", "addr": "0x5"}' \ -device '\{"id": "scsi1", "driver": "virtio-scsi-pci", "iothread": "iothread1","bus": "pci.0", "addr": "0x6"}' \ -blockdev '\{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '\{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \ -device '\{"driver": "virtio-blk-pci", "id": "image1", "drive": "drive_image1", "bootindex": 0, "write-cache": "on", "bus": "pci.0", "addr": "0x3", "iothread": "iothread0"}' \ \ -blockdev '\{"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg0.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '\{"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}' \ -blockdev '\{"node-name": "file_stg1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg1.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '\{"node-name": "drive_stg1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg1"}' \ -blockdev '\{"node-name": "file_stg2", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg2.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '\{"node-name": "drive_stg2", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg2"}' \ -blockdev '\{"node-name": "file_stg3", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/stg3.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '\{"node-name": "drive_stg3", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg3"}' \ \ -device '\{"driver": "virtio-net-pci", "mac": "9a:bb:7b:d3:cb:5d", "id": "idKG4PKp", "netdev": "id7VQKjJ", "bus": "pci.0", "addr": "0x4"}' \ -netdev tap,id=id7VQKjJ,vhost=on \ -vnc :5 \ -monitor stdio \ -qmp tcp:0:5955,server=on,wait=off \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -chardev socket,id=socket-serial,path=/var/tmp/socket-serial,logfile=/var/tmp/file-serial.log,mux=on,server=on,wait=off \ -serial chardev:socket-serial \ -chardev file,path=/var/tmp/file-bios.log,id=file-bios \ -device isa-debugcon,chardev=file-bios,iobase=0x402 \ -chardev socket,id=socket-qmp,path=/var/tmp/socket-qmp,logfile=/var/tmp/file-qmp.log,mux=on,server=on,wait=off \ -mon chardev=socket-qmp,mode=control \ -chardev socket,id=socket-hmp,path=/var/tmp/socket-hmp,logfile=/var/tmp/file-hmp.log,mux=on,server=on,wait=off \ -mon chardev=socket-hmp,mode=readline \ 2. run plug/unplug script ./run_plug.sh add_disks.txt del_disks.txt cat add_disks.txt {"execute": "qmp_capabilities" } {"execute": "device_add", "arguments": \{"driver": "scsi-hd", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "bus": "scsi0.0"}} {"execute": "device_add", "arguments": \{"driver": "scsi-hd", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "scsi1.0"}} {"execute": "device_add", "arguments": \{"driver": "virtio-blk-pci", "id": "stg2", "drive": "drive_stg2", "write-cache": "on", "bus": "pci.0", "addr": "0x7", "iothread": "iothread0"}} {"execute": "device_add", "arguments": \{"driver": "virtio-blk-pci", "id": "stg3", "drive": "drive_stg3", "write-cache": "on", "bus": "pci.0", "addr": "0x8", "iothread": "iothread1"}} cat del_disks.txt { "execute": "qmp_capabilities" } { "execute": "device_del", "arguments": \{"id": "stg0"} } { "execute": "device_del", "arguments": \{"id": "stg1"} } { "execute": "device_del", "arguments": \{"id": "stg2"} } { "execute": "device_del", "arguments": \{"id": "stg3"} } cat run_plug.sh plug_script=$1 unplug_script=$2 for ((y=0;y<20;y++)) ; do for ((x=0;x<50;x++)); do echo "$y:$x" nc -U /var/tmp/socket-qmp < ${plug_script} ; sleep 0.5 nc -U /var/tmp/socket-qmp < ${unplug_script}; sleep 0.5 done sleep 5 done 3. login guest (optional) run lsblk to check disks (random run)

qing wang added a comment - 2023/09/28 8:25 AM

sometimes the disk not appear in the guest, sometimes the qmp loses response, and sometimes the guest kernel crashes.

qing wang added a comment - 2023/09/28 8:25 AM sometimes the disk not appear in the guest, sometimes the qmp loses response, and sometimes the guest kernel crashes.

Hanna Czenczek added a comment - 2023/09/27 4:22 PM

Wrote a script to do the steps listed, couldn’t reproduce it so far. I’ll continue to try on Friday.

From what I understand, there is no error on the host, qemu continues to run fine and e.g. respond to QMP (guest also continues to run), the device just doesn’t appear in the guest, and there is a crash in the guest kernel, right? Is this also exactly what happens for virtio-blk? I’m reminded of https://bugzilla.redhat.com/show_bug.cgi?id=2207634, but if it also appears for virtio-blk, it must be unrelated.

Hanna Czenczek added a comment - 2023/09/27 4:22 PM Wrote a script to do the steps listed, couldn’t reproduce it so far. I’ll continue to try on Friday. From what I understand, there is no error on the host, qemu continues to run fine and e.g. respond to QMP (guest also continues to run), the device just doesn’t appear in the guest, and there is a crash in the guest kernel, right? Is this also exactly what happens for virtio-blk? I’m reminded of https://bugzilla.redhat.com/show_bug.cgi?id=2207634, but if it also appears for virtio-blk, it must be unrelated.

qing wang added a comment - 2023/09/26 8:30 AM

not only scsi disk, but also virtio disk has same issue
{"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/storage0.raw", "cache": {"direct": true, "no-flush": false}}, "id": "3Q3o1E5T"}
{"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "raw", "read-only": false, "cache":

{"direct": true, "no-flush": false}

, "file": "file_stg0"}, "id": "nqP8AgVC"}

{"execute": "device_add", "arguments": \{"driver": "virtio-blk-pci", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "bus": "pci.0", "addr": "0x5", "iothread": "iothread1"}

, "id": "DXHZZ3FW"}

qing wang added a comment - 2023/09/26 8:30 AM not only scsi disk, but also virtio disk has same issue {"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "native", "filename": "/home/kvm_autotest_root/images/storage0.raw", "cache": {"direct": true, "no-flush": false}}, "id": "3Q3o1E5T"} {"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "raw", "read-only": false, "cache": {"direct": true, "no-flush": false} , "file": "file_stg0"}, "id": "nqP8AgVC"} {"execute": "device_add", "arguments": \{"driver": "virtio-blk-pci", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "bus": "pci.0", "addr": "0x5", "iothread": "iothread1"} , "id": "DXHZZ3FW"}

qing wang added a comment - 2023/09/15 9:49 AM - edited

hotplug failed:

http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qdata/qbugs/j3934/2023-09-26/onepci/debug.log

OS crash:

http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qdata/qbugs/j3934/2023-09-26/wild_delay/serial-serial0-avocado-vt-vm1.log

qing wang added a comment - 2023/09/15 9:49 AM - edited hotplug failed: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qdata/qbugs/j3934/2023-09-26/onepci/debug.log OS crash: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qdata/qbugs/j3934/2023-09-26/wild_delay/serial-serial0-avocado-vt-vm1.log

qing wang added a comment - 2023/09/15 9:42 AM

automation reproducer

python ConfigTest.py --testcase=block_hotplug.block_scsi.fmt_qcow2.default.with_plug.with_repetition.one_pci --guestname=RHEL.9.3.0 --firmware=default_bios --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=5

python ConfigTest.py --testcase=multi_disk_wild_hotplug.without_delay --guestname=RHEL.9.3.0 --firmware=default_bios --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=5

qing wang added a comment - 2023/09/15 9:42 AM automation reproducer python ConfigTest.py --testcase=block_hotplug.block_scsi.fmt_qcow2.default.with_plug.with_repetition.one_pci --guestname=RHEL.9.3.0 --firmware=default_bios --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=5 python ConfigTest.py --testcase=multi_disk_wild_hotplug.without_delay --guestname=RHEL.9.3.0 --firmware=default_bios --clone=no --iothread_scheme=roundrobin --nr_iothreads=2 --nrepeat=5

Assignee:: Hanna Czenczek

Reporter:: qing wang

Developer:: virt-maint

QA Contact:: qing wang

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Created:: 2023/09/15 9:40 AM

Updated:: 2024/09/23 9:32 PM

Resolved:: 2024/04/30 9:38 AM

Dev Target end:: 2024/02/19

Target end:: 2024/03/04

Release Date:: 2024/04/30

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2024/04/30 9:38 AM

Expand comment: Errata Tool added a comment - 2024/04/30 9:38 AM

Collapse comment: Aihua Liang added a comment - 2024/02/21 2:12 PM

Expand comment: Aihua Liang added a comment - 2024/02/21 2:12 PM

Collapse comment: Miroslav Rezanina added a comment - 2024/02/19 9:41 AM

Expand comment: Miroslav Rezanina added a comment - 2024/02/19 9:41 AM

Collapse comment: Hanna Czenczek added a comment - 2024/02/09 1:34 PM

Expand comment: Hanna Czenczek added a comment - 2024/02/09 1:34 PM

Collapse comment: Hanna Czenczek added a comment - 2024/01/25 5:39 PM, Edited by Hanna Czenczek - 2024/01/25 6:22 PM

Expand comment: Hanna Czenczek added a comment - 2024/01/25 5:39 PM, Edited by Hanna Czenczek - 2024/01/25 6:22 PM

Collapse comment: Aihua Liang added a comment - 2023/12/14 11:14 AM

Expand comment: Aihua Liang added a comment - 2023/12/14 11:14 AM

Collapse comment: Stefan Hajnoczi added a comment - 2023/12/13 9:35 PM

Expand comment: Stefan Hajnoczi added a comment - 2023/12/13 9:35 PM

Collapse comment: Stefan Hajnoczi added a comment - 2023/12/12 8:29 PM

Expand comment: Stefan Hajnoczi added a comment - 2023/12/12 8:29 PM

Collapse comment: Hanna Czenczek added a comment - 2023/12/06 5:40 PM, Edited by Hanna Czenczek - 2023/12/06 5:42 PM

Expand comment: Hanna Czenczek added a comment - 2023/12/06 5:40 PM, Edited by Hanna Czenczek - 2023/12/06 5:42 PM

Collapse comment: Hanna Czenczek added a comment - 2023/11/30 10:59 AM, Edited by Hanna Czenczek - 2023/11/30 11:00 AM

Expand comment: Hanna Czenczek added a comment - 2023/11/30 10:59 AM, Edited by Hanna Czenczek - 2023/11/30 11:00 AM

Collapse comment: Hanna Czenczek added a comment - 2023/10/17 4:26 PM

Expand comment: Hanna Czenczek added a comment - 2023/10/17 4:26 PM

Collapse comment: Hanna Czenczek added a comment - 2023/10/17 10:00 AM, Edited by Hanna Czenczek - 2023/10/17 10:03 AM

Expand comment: Hanna Czenczek added a comment - 2023/10/17 10:00 AM, Edited by Hanna Czenczek - 2023/10/17 10:03 AM

Collapse comment: Stefan Hajnoczi added a comment - 2023/10/16 8:28 PM, Edited by Stefan Hajnoczi - 2023/10/16 8:31 PM

Expand comment: Stefan Hajnoczi added a comment - 2023/10/16 8:28 PM, Edited by Stefan Hajnoczi - 2023/10/16 8:31 PM

Collapse comment: Hanna Czenczek added a comment - 2023/10/13 4:24 PM

Expand comment: Hanna Czenczek added a comment - 2023/10/13 4:24 PM

Collapse comment: Hanna Czenczek added a comment - 2023/10/10 9:52 AM

Expand comment: Hanna Czenczek added a comment - 2023/10/10 9:52 AM

Collapse comment: Hanna Czenczek added a comment - 2023/10/06 4:19 PM, Edited by Hanna Czenczek - 2023/10/06 4:23 PM

Expand comment: Hanna Czenczek added a comment - 2023/10/06 4:19 PM, Edited by Hanna Czenczek - 2023/10/06 4:23 PM

Collapse comment: Hanna Czenczek added a comment - 2023/10/06 8:43 AM

Expand comment: Hanna Czenczek added a comment - 2023/10/06 8:43 AM

Collapse comment: Ewan Milne added a comment - 2023/10/05 5:28 PM

Expand comment: Ewan Milne added a comment - 2023/10/05 5:28 PM

Collapse comment: Stefan Hajnoczi added a comment - 2023/10/05 5:02 PM

Expand comment: Stefan Hajnoczi added a comment - 2023/10/05 5:02 PM

Collapse comment: Hanna Czenczek added a comment - 2023/10/05 2:42 PM

Expand comment: Hanna Czenczek added a comment - 2023/10/05 2:42 PM

Collapse comment: Hanna Czenczek added a comment - 2023/10/05 9:31 AM, Edited by Hanna Czenczek - 2023/10/05 9:34 AM

Expand comment: Hanna Czenczek added a comment - 2023/10/05 9:31 AM, Edited by Hanna Czenczek - 2023/10/05 9:34 AM

Collapse comment: Hanna Czenczek added a comment - 2023/09/29 4:40 PM

Expand comment: Hanna Czenczek added a comment - 2023/09/29 4:40 PM

Collapse comment: Hanna Czenczek added a comment - 2023/09/29 2:59 PM

Expand comment: Hanna Czenczek added a comment - 2023/09/29 2:59 PM

Collapse comment: Hanna Czenczek added a comment - 2023/09/29 2:01 PM

Expand comment: Hanna Czenczek added a comment - 2023/09/29 2:01 PM

Collapse comment: qing wang added a comment - 2023/09/28 8:30 AM

Expand comment: qing wang added a comment - 2023/09/28 8:30 AM

Collapse comment: qing wang added a comment - 2023/09/28 8:25 AM

Expand comment: qing wang added a comment - 2023/09/28 8:25 AM

Collapse comment: Hanna Czenczek added a comment - 2023/09/27 4:22 PM

Expand comment: Hanna Czenczek added a comment - 2023/09/27 4:22 PM

Collapse comment: qing wang added a comment - 2023/09/26 8:30 AM

Expand comment: qing wang added a comment - 2023/09/26 8:30 AM

Collapse comment: qing wang added a comment - 2023/09/15 9:49 AM, Edited by qing wang - 2023/09/27 2:36 AM

Expand comment: qing wang added a comment - 2023/09/15 9:49 AM, Edited by qing wang - 2023/09/27 2:36 AM

Collapse comment: qing wang added a comment - 2023/09/15 9:42 AM

Expand comment: qing wang added a comment - 2023/09/15 9:42 AM

People

Dates