Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-19360

guest no response after repeatedly hotplug/unplug disk is doing IO

    • Yes
    • Critical
    • Regression, CustomerScenariosInitiative
    • rhel-sst-virtualization-storage
    • ssg_virtualization
    • None
    • QE ack
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • x86_64
    • Linux
    • None

      What were you trying to do that didn't work?

      The guest has no response after repeatedly hotplug/unplugging disk is doing IO

      It will get kernel call track

      [ 246.486224] INFO: task kworker/u16:4:68 blocked for more than 122 seconds.
      [ 246.490419] Not tainted 5.14.0-392.el9.x86_64 #1
      [ 246.493362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 246.497973] task:kworker/u16:4 state stack:0 pid:68 ppid:2 flags:0x00004000
      [ 246.501369] Workqueue: writeback wb_workfn (flush-253:0)
      [ 246.502614] Call Trace:
      [ 246.503206] <TASK>
      [ 246.503711] __schedule+0x21c/0x560
      [ 246.504545] ? __pfx_wbt_cleanup_cb+0x10/0x10
      [ 246.505621] schedule+0x2d/0x70
      [ 246.506398] io_schedule+0x42/0x70
      [ 246.507257] rq_qos_wait+0xbb/0x130
      [ 246.508098] ? __pfx_rq_qos_wake_function+0x10/0x10
      [ 246.509236] ? __pfx_wbt_inflight_cb+0x10/0x10
      [ 246.510283] wbt_wait+0x9c/0x100
      [ 246.510974] __rq_qos_throttle+0x20/0x40
      [ 246.511750] blk_mq_get_new_requests+0xcf/0x190
      [ 246.512652] blk_mq_submit_bio+0x2db/0x410
      [ 246.513511] __submit_bio_noacct+0x7e/0x1e0
      [ 246.514353] iomap_submit_ioend+0x4e/0x80
      [ 246.515157] xfs_vm_writepages+0x7a/0xb0 [xfs]
      [ 246.516250] do_writepages+0xcc/0x1d0
      [ 246.516971] ? fprop_fraction_percpu+0x2b/0x80
      [ 246.517861] ? __wb_calc_thresh+0x3a/0x130
      [ 246.518680] __writeback_single_inode+0x41/0x270
      [ 246.519602] writeback_sb_inodes+0x209/0x4a0
      [ 246.520447] __writeback_inodes_wb+0x4c/0xe0
      [ 246.521300] wb_writeback+0x1d7/0x2d0
      [ 246.522038] wb_do_writeback+0x22a/0x2b0
      [ 246.522819] wb_workfn+0x5e/0x290
      [ 246.523483] ? _raw_spin_unlock+0xa/0x30
      [ 246.524295] ? finish_task_switch.isra.0+0x8c/0x2a0
      [ 246.525314] ? __schedule+0x224/0x560
      [ 246.526063] process_one_work+0x1e2/0x3b0
      [ 246.526843] worker_thread+0x50/0x3a0
      [ 246.527571] ? __pfx_worker_thread+0x10/0x10
      [ 246.528416] kthread+0xdd/0x100
      [ 246.529052] ? __pfx_kthread+0x10/0x10
      [ 246.529775] ret_from_fork+0x29/0x50
      [ 246.530495] </TASK>
      [ 246.530945] INFO: task xfsaild/dm-0:775 blocked for more than 122 seconds.
      [ 246.532269] Not tainted 5.14.0-392.el9.x86_64 #1
      [ 246.533280] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 246.534769] task:xfsaild/dm-0 state stack:0 pid:775 ppid:2 flags:0x00004000
      [ 246.536353] Call Trace:
      [ 246.536832] <TASK>
      [ 246.537264] __schedule+0x21c/0x560
      [ 246.537939] ? __pfx_wbt_cleanup_cb+0x10/0x10
      [ 246.538809] schedule+0x2d/0x70
      [ 246.539472] io_schedule+0x42/0x70
      [ 246.540186] rq_qos_wait+0xbb/0x130
      [ 246.540865] ? __pfx_rq_qos_wake_function+0x10/0x10
      [ 246.541806] ? __pfx_wbt_inflight_cb+0x10/0x10
      [ 246.543155] wbt_wait+0x9c/0x100
      [ 246.544150] __rq_qos_throttle+0x20/0x40
      [ 246.545351] blk_mq_get_new_requests+0xcf/0x190
      [ 246.546727] blk_mq_submit_bio+0x2db/0x410
      [ 246.547976] __submit_bio_noacct+0x7e/0x1e0
      [ 246.549242] xfs_buf_ioapply_map+0x1cb/0x270 [xfs]
      [ 246.550885] _xfs_buf_ioapply+0xcf/0x1b0 [xfs]
      [ 246.552427] ? __pfx_default_wake_function+0x10/0x10
      [ 246.553922] __xfs_buf_submit+0x6e/0x1e0 [xfs]
      [ 246.555446] xfs_buf_delwri_submit_buffers+0xe3/0x230 [xfs]
      [ 246.557292] xfsaild_push+0x19d/0x730 [xfs]
      [ 246.559356] xfsaild+0xa4/0x1e0 [xfs]
      [ 246.562148] ? __pfx_xfsaild+0x10/0x10 [xfs]
      [ 246.565300] kthread+0xdd/0x100
      [ 246.567153] ? __pfx_kthread+0x10/0x10
      [ 246.569323] ret_from_fork+0x29/0x50
      [ 246.570877] </TASK>
      [ 369.361608] INFO: task kworker/u16:4:68 blocked for more than 245 seconds.
      [ 369.365631] Not tainted 5.14.0-392.el9.x86_64 #1
      [ 369.368570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 369.372355] task:kworker/u16:4 state stack:0 pid:68 ppid:2 flags:0x00004000
      [ 369.374502] Workqueue: writeback wb_workfn (flush-253:0)
      [ 369.375876] Call Trace:
      [ 369.376533] <TASK>
      [ 369.377091] __schedule+0x21c/0x560
      [ 369.378003] ? __pfx_wbt_cleanup_cb+0x10/0x10
      [ 369.379151] schedule+0x2d/0x70
      [ 369.379984] io_schedule+0x42/0x70
      [ 369.380877] rq_qos_wait+0xbb/0x130
      [ 369.381812] ? __pfx_rq_qos_wake_function+0x10/0x10
      [ 369.382801] ? __pfx_wbt_inflight_cb+0x10/0x10
      [ 369.383707] wbt_wait+0x9c/0x100
      [ 369.384355] __rq_qos_throttle+0x20/0x40
      [ 369.385154] blk_mq_get_new_requests+0xcf/0x190
      [ 369.386076] blk_mq_submit_bio+0x2db/0x410
      [ 369.386915] __submit_bio_noacct+0x7e/0x1e0
      [ 369.387772] iomap_submit_ioend+0x4e/0x80
      [ 369.388630] xfs_vm_writepages+0x7a/0xb0 [xfs]
      [ 369.389747] do_writepages+0xcc/0x1d0
      [ 369.389754] ? fprop_fraction_percpu+0x2b/0x80
      [ 369.391398] ? __wb_calc_thresh+0x3a/0x130
      [ 369.392207] __writeback_single_inode+0x41/0x270
      [ 369.393156] writeback_sb_inodes+0x209/0x4a0
      [ 369.394030] __writeback_inodes_wb+0x4c/0xe0
      [ 369.394916] wb_writeback+0x1d7/0x2d0
      [ 369.395661] wb_do_writeback+0x22a/0x2b0
      [ 369.396457] wb_workfn+0x5e/0x290
      [ 369.397118] ? _raw_spin_unlock+0xa/0x30
      [ 369.397911] ? finish_task_switch.isra.0+0x8c/0x2a0
      [ 369.398919] ? __schedule+0x224/0x560
      [ 369.399664] process_one_work+0x1e2/0x3b0
      [ 369.400478] worker_thread+0x50/0x3a0
      [ 369.401203] ? __pfx_worker_thread+0x10/0x10
      [ 369.402063] kthread+0xdd/0x100
      [ 369.402709] ? __pfx_kthread+0x10/0x10
      [ 369.403469] ret_from_fork+0x29/0x50
      [ 369.404182] </TASK>
      [ 369.404667] INFO: task xfsaild/dm-0:775 blocked for more than 245 seconds.
      [ 369.406021] Not tainted 5.14.0-392.el9.x86_64 #1
      [ 369.407032] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 369.408551] task:xfsaild/dm-0 state stack:0 pid:775 ppid:2 flags:0x00004000
      [ 369.410169] Call Trace:
      [ 369.410681] <TASK>
      [ 369.411106] __schedule+0x21c/0x560
      [ 369.411814] ? __pfx_wbt_cleanup_cb+0x10/0x10
      [ 369.411817] schedule+0x2d/0x70
      [ 369.411820] io_schedule+0x42/0x70
      [ 369.411823] rq_qos_wait+0xbb/0x130
      [ 369.411827] ? __pfx_rq_qos_wake_function+0x10/0x10
      [ 369.411830] ? __pfx_wbt_inflight_cb+0x10/0x10
      [ 369.411834] wbt_wait+0x9c/0x100
      [ 369.411837] __rq_qos_throttle+0x20/0x40
      [ 369.411841] blk_mq_get_new_requests+0xcf/0x190
      [ 369.411846] blk_mq_submit_bio+0x2db/0x410
      [ 369.411851] __submit_bio_noacct+0x7e/0x1e0
      [ 369.411856] xfs_buf_ioapply_map+0x1cb/0x270 [xfs]
      [ 369.412024] _xfs_buf_ioapply+0xcf/0x1b0 [xfs]
      [ 369.412170] ? __pfx_default_wake_function+0x10/0x10
      [ 369.412175] __xfs_buf_submit+0x6e/0x1e0 [xfs]
      [ 369.412322] xfs_buf_delwri_submit_buffers+0xe3/0x230 [xfs]
      [ 369.412494] xfsaild_push+0x19d/0x730 [xfs]
      [ 369.412655] xfsaild+0xa4/0x1e0 [xfs]
      [ 369.412811] ? __pfx_xfsaild+0x10/0x10 [xfs]
      [ 369.412965] kthread+0xdd/0x100
      [ 369.412968] ? __pfx_kthread+0x10/0x10
      [ 369.412970] ret_from_fork+0x29/0x50
      [ 369.412977] </TASK>
      [ 369.412990] INFO: task in:imjournal:1293 blocked for more than 122 seconds.
      [ 369.412992] Not tainted 5.14.0-392.el9.x86_64 #1
      [ 369.412993] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 369.412993] task:in:imjournal state stack:0 pid:1293 ppid:1 flags:0x00000002
      [ 369.412997] Call Trace:
      [ 369.412998] <TASK>
      [ 369.413000] __schedule+0x21c/0x560
      [ 369.413002] ? up+0x12/0x60
      [ 369.413007] schedule+0x2d/0x70
      [ 369.413009] io_schedule+0x42/0x70
      [ 369.413012] folio_wait_bit+0xe9/0x200
      [ 369.413017] ? xas_load+0x9/0xa0
      [ 369.413022] ? __pfx_wake_page_function+0x10/0x10
      [ 369.413027] folio_wait_writeback+0x28/0x80
      [ 369.413031] truncate_inode_partial_folio+0x5a/0x130
      [ 369.413037] truncate_inode_pages_range+0x1ba/0x440
      [ 369.413043] ? fsnotify_move+0x145/0x1a0
      [ 369.413047] ? __inode_wait_for_writeback+0x7e/0xf0
      [ 369.413052] ? __pfx_wake_bit_function+0x10/0x10
      [ 369.413057] evict+0x1b0/0x1d0
      [ 369.413060] __dentry_kill+0xdf/0x180
      [ 369.413064] dput+0x144/0x2f0
      [ 369.413067] do_renameat2+0x303/0x530
      [ 369.413073] __x64_sys_rename+0x40/0x50
      [ 369.413076] do_syscall_64+0x59/0x90
      [ 369.413081] ? syscall_exit_work+0x103/0x130
      [ 369.413086] ? syscall_exit_to_user_mode+0x22/0x40
      [ 369.413090] ? do_syscall_64+0x69/0x90
      [ 369.413093] ? do_syscall_64+0x69/0x90
      [ 369.413096] entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [ 369.413102] RIP: 0033:0x7f0ffc8727eb
      [ 369.413138] RSP: 002b:00007f0ffb11eb68 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
      [ 369.413144] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007f0ffc8727eb
      [ 369.413145] RDX: 0000000000000000 RSI: 000055e6b8e5e490 RDI: 00007f0ffb11eb70
      [ 369.413147] RBP: 0000000000000009 R08: 0000000000000000 R09: 00007f0ffb11e9f0
      [ 369.413149] R10: 00000000000001a4 R11: 0000000000000246 R12: 00007f0ffb11eb70
      [ 369.413150] R13: 0000000000000078 R14: 00007f0ffb11fc20 R15: 00007f0fec004f10
      [ 369.413154] </TASK>

      Please provide the package NVR for which bug is seen:

      Red Hat Enterprise Linux release 9.4 Beta (Plow)
      5.14.0-391.el9.x86_64
      qemu-kvm-8.1.0-4.el9.x86_64
      seabios-bin-1.16.1-1.el9.noarch
      edk2-ovmf-20230524-4.el9_3.noarch
      libvirt-9.5.0-7.el9_3.x86_64
      virtio-win-prewhql-0.1-240.iso

      How reproducible:
      >30 in automation

      Steps to reproduce

      1. create image file
      
      2. Boot vm
      
      /usr/libexec/qemu-kvm \
           -name 'avocado-vt-vm1'  \
           -sandbox on \
           -machine pc,memory-backend=mem-machine_mem  \
           -nodefaults \
           -device '{"driver": "VGA", "bus": "pci.0", "addr": "0x2"}' \
           -m 30720 \
           -object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
           -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2  \
           -cpu 'Skylake-Server-IBRS',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,kvm_pv_unhalt=on \
           -device '{"driver": "ich9-usb-ehci1", "id": "usb1", "addr": "0x1d.0x7", "multifunction": true, "bus": "pci.0"}' \
           -device '{"driver": "ich9-usb-uhci1", "id": "usb1.0", "multifunction": true, "masterbus": "usb1.0", "addr": "0x1d.0x0", "firstport": 0, "bus": "pci.0"}' \
           -device '{"driver": "ich9-usb-uhci2", "id": "usb1.1", "multifunction": true, "masterbus": "usb1.0", "addr": "0x1d.0x2", "firstport": 2, "bus": "pci.0"}' \
           -device '{"driver": "ich9-usb-uhci3", "id": "usb1.2", "multifunction": true, "masterbus": "usb1.0", "addr": "0x1d.0x4", "firstport": 4, "bus": "pci.0"}' \
           -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
           -object '{"qom-type": "iothread", "id": "iothread0"}' \
           -object '{"qom-type": "iothread", "id": "iothread1"}' \
           -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pci.0", "addr": "0x3", "iothread": "iothread0"}' \
           -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel940-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' \
           -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
           -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
           -device '{"driver": "virtio-net-pci", "mac": "9a:3b:47:ea:7d:47", "id": "iddK820r", "netdev": "idXMsvPf", "bus": "pci.0", "addr": "0x4"}' \
           -netdev  '{"id": "idXMsvPf", "type": "tap", "vhost": true}'  \
           -vnc :5 \
           -monitor stdio \
           -qmp tcp:0:5955,server=on,wait=off \
           -boot menu=on,reboot-timeout=1000,strict=off \
           \
           -chardev socket,id=socket-serial,path=/var/tmp/socket-serial,logfile=/var/tmp/file-serial.log,mux=on,server=on,wait=off \
           -serial chardev:socket-serial \
           -chardev file,path=/var/tmp/file-bios.log,id=file-bios \
           -device isa-debugcon,chardev=file-bios,iobase=0x402 \
           \
      
      3. hotplug disk
      
      {"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/storage0.qcow2", "cache": {"direct": true, "no-flush": false}}, "id": "uNRhq5lR"}
      {"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}, "id": "VtvrjLkH"}
      {"execute": "device_add", "arguments": {"driver": "scsi-hd", "id": "stg0", "bus": "virtio_scsi_pci0.0", "drive": "drive_stg0", "write-cache": "on"}, "id": "pWLotHhW"}
      
      4. execute on added disk 
      
      dev=`lsblk|grep 30G|awk '{print $1}'`
      echo "$dev"
      parted -s "/dev/$dev" mklabel msdos
      parted -s "/dev/$dev" mkpart primary 0M 30720.0M
      partprobe /dev/$dev
      yes|mkfs.ext4 -F "/dev/${dev}1"
      mkdir -p /mnt/$dev
      #umount -fl /mnt/$dev
      mount -t ext4 /dev/${dev}1 /mnt/$dev
      if mount|grep $dev;then
      #/home/iozone_inst/src/current/iozone -az -g 10G -y 32k -i 0 -i 1 -I -f /mnt/$dev/iozone_test &
      fio --direct=1 --name=test --iodepth=1 --thread --rw=randwrite --bs=4096 --runtime=3600 --filename=/mnt/$dev/test.img --size=25G
      
      else
      	echo "mount err"
      fi
      
      5. sleep 10 seconds
      
      6. unplug the disk
      {"execute": "device_del", "arguments": {"id": "stg0"}, "id": "R81C1q8p"}
      {"execute": "blockdev-del", "arguments": {"node-name": "drive_stg0"}, "id": "EAiI2LVe"}
      {"execute": "blockdev-del", "arguments": {"node-name": "file_stg0"}, "id": "IJRbWqTG"}
      
      7. repeat steps 3-6 
      
      

      Expected results
      operation succeeds

      Actual results

      can not find new hotplug disk, the guest has no response

      It does not hit the issue on
      Red Hat Enterprise Linux release 9.3 (Plow)
      5.14.0-362.13.1.el9_3.x86_64
      qemu-kvm-8.0.0-16.el9_3.1.x86_64
      seabios-bin-1.16.1-1.el9.noarch
      edk2-ovmf-20230524-4.el9_3.noarch
      libvirt-9.5.0-7.el9_3.x86_64
      virtio-win-prewhql-0.1-240.iso

              kwolf@redhat.com Kevin Wolf
              qingwangrh qing wang
              virt-maint virt-maint
              qing wang qing wang
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: