Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-88175

process attempting to read a disk backed by 'vhost-vdpa' gets stuck after pivoting a blockdev-mirror

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • No
    • Moderate
    • Regression
    • 2
    • rhel-virt-storage
    • ssg_virtualization
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • virt-storage Sprint 6, Patch Under review backlog
    • Fail
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      'dd' and 'fdisk -l' commands hang in the guest after blockcopy and block job pivoting. 

      Please provide the package NVR for which bug is seen:

      libvirt-10.3.0-2.el9.x86_64
      qemu-kvm-9.0.0-3.el9.x86_64
      kernel-5.14.0-452.el9.x86_64

      also reproduced with:
      libvirt-11.2.0-1.el10.x86_64
      qemu-kvm-9.1.0-17.el10.x86_64
       

      How reproducible:

      100%

      Steps to reproduce

      1. Prepare two simulated vhost-vdpa disks on host:
      modprobe vhost_vdpa
      modprobe vdpa_sim_blk
      vdpa dev add mgmtdev vdpasim_blk name vdpa0
      vdpa dev add mgmtdev vdpasim_blk name vdpa1
      2. Define a guest with vhostvdpa disk and shared memory.
      <memoryBacking>
      <access mode='shared'/>
      </memoryBacking>
      ...
      <disk type='vhostvdpa' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads' copy_on_read='on' discard='unmap' detect_zeroes='on'/>
      <source dev='/dev/vhost-vdpa-0'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      </disk>

      3. Start the vm and check the qemu cmd

      -add-fd set=0,fd=20,opaque=libvirt-1-storage-vdpa -blockdev \{"driver":"virtio-blk-vhost-vdpa","path":"/dev/fdset/0","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false}} -blockdev \{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"on","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -blockdev \{"driver":"copy-on-read","node-name":"libvirt-CoR-vdb","file":"libvirt-1-format","discard":"unmap"} -device \{"driver":"virtio-blk-pci","bus":"pci.7","addr":"0x0","drive":"libvirt-CoR-vdb","id":"virtio-disk1","write-cache":"on"}
      

      4. Exec virsh blockcopy

      1. cat /tmp/xml_utils_temp_rwbbxhvf.xml
        <disk type="vhostvdpa" device="disk"><target dev="vdb" bus="virtio" /><source dev="/dev/vhost-vdpa-1" /><driver name="qemu" type="raw" cache="none" io="threads" copy_on_read="on" discard="unmap" detect_zeroes="on" /></disk>
      2. virsh blockcopy avocado-vt-vm1 vdb --xml /tmp/xml_utils_temp_rwbbxhvf.xml --transient-job --wait --verbose

      5. virsh blockjob avocado-vt-vm1 vdb --pivot
      6. login to vm and run 'dd' or 'fdisk -l'

      1. dd if=/dev/zero of=/dev/vdb bs=10

      Expected results

      There is no error during blockcopy; 'dd' and 'fdisk -l' commands completed without error.

      Actual results

      1. 'dd' command hangs in step 6

      [root@localhost ~]# dd if=/dev/zero of=/dev/vdb bs=10
      [  246.952532] INFO: task dd:1314 blocked for more than 122 seconds.
      [  246.952956]       Not tainted 5.14.0-452.el9.x86_64 #1
      [  246.953164] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  246.953543] task:dd              state:D stack:0     pid:1314  tgid:1314  ppid:1287   flags:0x00000002
      [  246.953814] Call Trace:
      [  246.953975]  <TASK>
      [  246.954134]  __schedule+0x21b/0x550
      [  246.954546]  schedule+0x2e/0xd0
      [  246.954749]  io_schedule+0x42/0x70
      [  246.954954]  bit_wait_io+0xd/0x60
      [  246.955212]  __wait_on_bit+0x48/0x150
      [  246.955459]  ? __pfx_bit_wait_io+0x10/0x10
      [  246.955702]  out_of_line_wait_on_bit+0x92/0xb0
      [  246.955909]  ? __pfx_wake_bit_function+0x10/0x10
      [  246.956156]  __block_write_begin_int+0x1fd/0x370
      [  246.956468]  iomap_write_begin+0x1cb/0x2b0
      [  246.956716]  iomap_write_iter+0xfb/0x250
      [  246.956927]  ? do_user_addr_fault+0x1d6/0x6a0
      [  246.957156]  iomap_file_buffered_write+0x83/0xe0
      [  246.957430]  blkdev_write_iter+0x1b9/0x270
      [  246.957669]  vfs_write+0x2cb/0x410
      [  246.957924]  ksys_write+0x5f/0xe0
      [  246.958133]  do_syscall_64+0x5c/0xf0
      [  246.958395]  ? exc_page_fault+0x62/0x150
      [  246.958638]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [  246.958871] RIP: 0033:0x7f7f520fda57
      [  246.959443] RSP: 002b:00007ffd6b53bd38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  246.959673] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f7f520fda57
      [  246.959857] RDX: 000000000000000a RSI: 000055d4cb140000 RDI: 0000000000000001
      [  246.960003] RBP: 000055d4cb140000 R08: 000055d4cb140000 R09: 0000000000000077
      [  246.960128] R10: 000000000000006b R11: 0000000000000246 R12: 0000000000000000
      [  246.960253] R13: 0000000000000000 R14: 0000000000000000 R15: 000055d4cb140000
      [  246.960417]  </TASK>
      

      Additional info:
      reproduced on RHEL9.4:
      libvirt-10.0.0-6.3.el9_4.x86_64
      qemu-kvm-8.2.0-11.el9_4.2.x86_64
      kernel-5.14.0-427.1.1.el9_4.x86_64

              shajnocz@redhat.com Stefan Hajnoczi
              yicui1 Yingshun Cui
              virt-maint virt-maint
              Qinghua Cheng Qinghua Cheng
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: