-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.5
-
No
-
Moderate
-
rhel-sst-virtualization-storage
-
ssg_virtualization
-
None
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
Automated
-
-
s390x
-
None
What were you trying to do that didn't work?
Write to xfs filesystem after deleting an external snapshot.
Please provide the package NVR for which bug is seen:
libvirt-10.5.0-4.el9.s390x
qemu-kvm-9.0.0-7.el9.s390x
guest kernel: 5.14.0-480
How reproducible:
10%
Steps to reproduce
- Write to a file in the guest (use dd)
- Create an external snapshot (with memspec and diskspec)
- Write to a file in the guest (use dd)
- Delete the external snapshot
- Repeat and mix up these steps
Use the attached script.sh to reproduce this more reliably:
- Start a VM with console on unix socket to send commands
<serial type="unix"> <source mode="bind" path="/tmp/vm"/> <target type="sclp-serial" port="0"> <model name="sclpconsole"/> </target> </serial>
- Log in through
nc -U /tmp/vm
- Close the connection (CTRL+C)
- In a console run
while true; do sh script.sh; done
- At some point the dd command won't return, so there's no output anymore like "51200 bytes (51 kB, 50 KiB) copied, 0.000325012 s, 158 MB/s", instead the command will be followed by the 'virsh snapshot-delete' command. This is when you can stop the script and try to access the console again
nc -U /tmp/vm
Expected results
All actions succeed.
Actual results
At some point the console becomes unresponsive. Inspecting the memory dump we can see at least one uninterruptible (UN) task for dd:
PID: 1587 TASK: 7cf5c00 CPU: 0 COMMAND: "dd" #0 [380000af258] __schedule at 28505518 #1 [380000af2d8] schedule at 2850583e #2 [380000af308] io_schedule at 2850598a #3 [380000af338] folio_wait_bit_common at 27db4cdc #4 [380000af418] folio_wait_writeback at 27dbf3be #5 [380000af450] __filemap_fdatawait_range at 27db34a0 #6 [380000af540] filemap_write_and_wait_range at 27db7580 #7 [380000af588] xfs_setattr_size at 3ff7fd3e222 [xfs] #8 [380000af610] xfs_vn_setattr at 3ff7fd3e51c [xfs] #9 [380000af668] notify_change at 27ed81b4 #10 [380000af718] do_truncate at 27ea9f84 #11 [380000af7c0] do_open at 27ec1486 #12 [380000af830] path_openat at 27ec3f7c #13 [380000af898] do_filp_open at 27ec53c8 #14 [380000af9c0] do_sys_openat2 at 27eab550 #15 [380000afa28] do_sys_open at 27eabada #16 [380000afa70] __do_syscall at 284fddf0 #17 [380000afe98] system_call at 2850cb18 PSW: 0705000180000000 000003ffb1c00670 (user space) GPRS: 0000000000000000 0000000000000120 ffffffffffffffda 000003ffe6e7ad22 0000000000000241 00000000000001b6 00000000000001b6 000003ffe6e7ad2b 00000000000001b6 000003ffe6e7ad22 000003ffb1ef66a0 0000000000000241 000003ffb1eaff68 0000000000000241 000002aa2788a28e 000003ffe6e797c0
Other uninterruptible tasks might be listed, too:
PID: 545 TASK: 2779700 CPU: 0 COMMAND: "xfsaild/dm-0" #0 [37fffe0b880] __schedule at 936d518 #1 [37fffe0b900] schedule at 936d83e #2 [37fffe0b930] io_schedule at 936d98a #3 [37fffe0b960] rq_qos_wait at 8ebae90 #4 [37fffe0ba00] wbt_wait at 8ee5228 #5 [37fffe0ba58] __rq_qos_throttle at 8eba99e #6 [37fffe0ba90] blk_mq_submit_bio at 8eaa6c8 #7 [37fffe0bb20] __submit_bio_noacct at 8e98934 #8 [37fffe0bb70] _xfs_buf_ioapply at 3ff7fd1b9e2 [xfs] #9 [37fffe0bc38] __xfs_buf_submit at 3ff7fd1bbbe [xfs] #10 [37fffe0bc70] xfs_buf_delwri_submit_buffers at 3ff7fd1c36c [xfs] #11 [37fffe0bd00] xfsaild_push at 3ff7fd5dc84 [xfs] #12 [37fffe0bdb8] xfsaild at 3ff7fd5e296 [xfs] #13 [37fffe0be10] kthread at 8a303b0 #14 [37fffe0be68] __ret_from_fork at 89aeebc #15 [37fffe0be98] ret_from_fork at 9374b4a
PID: 553 TASK: b7d0000 CPU: 1 COMMAND: "kworker/u10:3" #0 [37fffe83318] __schedule at 936d518 #1 [37fffe83398] schedule at 936d83e #2 [37fffe833c8] io_schedule at 936d98a #3 [37fffe833f8] rq_qos_wait at 8ebae90 #4 [37fffe83498] wbt_wait at 8ee5228 #5 [37fffe834f0] __rq_qos_throttle at 8eba99e #6 [37fffe83528] blk_mq_submit_bio at 8eaa6c8 #7 [37fffe835b8] __submit_bio_noacct at 8e98934 #8 [37fffe83608] iomap_submit_ioend at 8da5f76 #9 [37fffe83648] iomap_writepage_map at 8da6de4 #10 [37fffe836f0] iomap_do_writepage at 8da716e #11 [37fffe83748] write_cache_pages at 8c298c6 #12 [37fffe83878] iomap_writepages at 8da5fe6 #13 [37fffe838a8] xfs_vm_writepages at 3ff7fd14732 [xfs] #14 [37fffe83950] do_writepages at 8c2ae3a #15 [37fffe839d0] __writeback_single_inode at 8d5c06c #16 [37fffe83a28] writeback_sb_inodes at 8d5c97c #17 [37fffe83b20] __writeback_inodes_wb at 8d5cd3a #18 [37fffe83b80] wb_writeback at 8d5d07e #19 [37fffe83c30] wb_workfn at 8d5e3c8 #20 [37fffe83d10] process_one_work at 8a24872 #21 [37fffe83d98] worker_thread at 8a2575e #22 [37fffe83e10] kthread at 8a303b0 #23 [37fffe83e68] __ret_from_fork at 89aeebc #24 [37fffe83e98] ret_from_fork at 9374b4a
Additional information
- This also reproduces for internal snapshots.
- At this point I'm not sure if this would happen on other archs. I have not found similar failures in our records for other archs. UPDATE Meina couldn't reproduce this on x86_64
- I tried running the loop for 4 minutes in a guest that was installed with the ext4 filesystem and did not reproduce the problem.
- The issue was hit by automated test "snapshot_delete.multiple_children.del_parent_snap" but only reproduces in 10% of the test case executions.
- Reproduces with RHEL 9.4 guest kernel version kernel-5.14.0-427.13.1.el9_4.s390x and qemu-kvm-8.2.0-11.el9_4.s390x libvirt-10.0.0-6.el9_4.s390x. In RHEL 9.4 external snapshots were improved to be fully supported so not considering this a Regression.
- I confirmed that the dd command succeeds repeatedly without issues when snapshot operations are omitted. For that I ran the dd commands without snapshot operations on the same setup.
- is related to
-
RHEL-57677 VM hangs after live dump
- Planning
- relates to
-
RHEL-7528 RFE: libvirt - improve support for external snapshots (merge, delete, virsh, etc)
- Closed