-
Bug
-
Resolution: Done
-
Undefined
-
None
-
rhel-9.6
-
No
-
Important
-
1
-
rhel-sst-virtualization-storage
-
ssg_virtualization
-
2
-
QE ack
-
False
-
-
None
-
Red Hat Enterprise Linux
-
virt-storage Sprint 3
-
None
-
New Test Coverage
-
-
x86_64
-
Linux
-
None
What were you trying to do that didn't work?
Dst guest call trace during reboot
Please provide the package NVR for which bug is seen:
kernel version: 6.12.0-31.el10.x86_64
qemu-kvm version: qemu-kvm-9.1.0-6.el10
How reproducible:
60% by auto, less than 10% by manual.
Steps to reproduce
1.Start src guest with qemu cmdline:
/usr/libexec/qemu-kvm \
-name "mouse-vm" \
-sandbox off \
-machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
-cpu Cascadelake-Server \
-nodefaults \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-blockdev '{"driver":"file","cache":
,"filename":"/mnt/rhel100-64-virtio-scsi-ovmf.qcow2","node-name":"drive_sys1"}' \
-blockdev '
' \
-blockdev '
' \
-blockdev '
' \
-blockdev '
' \
-blockdev '
' \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-object '
' \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm \
-qmp tcp:0:3333,server=on,wait=off \
-qmp tcp:0:9999,server=on,wait=off \
-qmp tcp:0:9888,server=on,wait=off \
-serial tcp:0:4444,server=on,wait=off \
-monitor stdio \
2. Login guest, dd a new file on it, and record its md5sum
- dd if=/dev/zero of=/home/file_base1 bs=1M count=5120 oflag=direct
5120+0 records in
5120+0 records out - md5sum /home/file_base1 | awk '
{print $1}
'
ec4bcc8776ea04479b786e063a9ace45
3. Create snapshot
{"execute": "blockdev-create", "arguments": {"options":
, "job-id": "job_90"}, "id": "Th01WLJT"}
{"execute": "job-dismiss", "arguments":
, "id": "OobzvjcA"}
{"execute": "blockdev-add", "arguments":
, "id": "VNhLjtat"}
{"execute": "blockdev-create", "arguments": {"options":
, "job-id": "job_8"}, "id": "sz7n2MUh"}
{"execute": "job-dismiss", "arguments":
, "id": "HOriXFVb"}
{"execute": "blockdev-add", "arguments":
, "id": "uhlrWPru"}
{"execute": "blockdev-snapshot", "arguments":
, "id": "tbKuuC4b"}
4. Create a snapshot image in dst
- ssh root@$dst_ip qemu-img create -f qcow2 -F qcow2 /home/mirror.qcow2 -b /mnt/rhel100-64-virtio-scsi-ovmf.qcow2 20G
5. Start dst guest with qemu cmdline:
/usr/libexec/qemu-kvm \
-name "mouse-vm" \
-sandbox off \
-machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
-cpu Cascadelake-Server \
-nodefaults \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-device '
' \
-blockdev '{"driver":"file","cache":
,"filename":"/home/mirror.qcow2","node-name":"drive_sys1"}' \
-blockdev '
' \
-blockdev '
' \
-blockdev '
' \
-blockdev '
' \
-blockdev '
' \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-object '
' \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm \
-qmp tcp:0:3333,server=on,wait=off \
-qmp tcp:0:9999,server=on,wait=off \
-qmp tcp:0:9888,server=on,wait=off \
-serial tcp:0:4444,server=on,wait=off \
-monitor stdio \
-incoming defer \
6. Start nbd server and export dst snapshot image
{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data":
}}, "id": "K8DYY8ot"}
{"execute": "block-export-add", "arguments":
, "id": "QWgKVFrd"}
7. Add dst image in src, and do mirror from src to dst with sync:top
{"execute": "blockdev-add", "arguments": {"node-name": "mirror", "driver": "nbd", "server":
, "export": "drive_image1"}, "id": "LqUUfWKe"}
{"execute": "blockdev-mirror", "arguments":
, "id": "ttkrHSo9"}
8. Set migration capabilities:pause-before-switchover in dst
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [
]}, "id": "3x3JypH1"}
{"execute": "migrate-incoming", "arguments":
, "id": "20QHZEKu"}
9. Set migration capabilities:pause-before-switchover in src, and migrate from src to dst
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [
]}, "id": "appJZkPR"}
{"execute": "migrate", "arguments":
, "id": "HE9PpwI3"}
10. After src vm reached to pre-switchover status, cancel the block jobs and continue the migration.
{"execute": "block-job-cancel", "arguments":
, "id": "DI81beit"}
{"execute": "migrate-continue", "arguments":
, "id": "9whITPlm"}
11. Check VM status in both src and dst
{"execute": "query-status", "id": "m5IjRn2C"}{"return":
{"status": "postmigrate", "running": false}, "id": "m5IjRn2C"}
{"execute": "query-status", "id": "kz0WOGQb"}{"return":
{"status": "running", "running": true}, "id": "kz0WOGQb"}
12. Reboot dst vm
{"execute": "system_reset", "id": "YiBrMe0a"}Expected results
After step12, dst vm reboot successfully, no core dump, no call trace in both guest and qemu
Actual results
After step12, guest vm call trace
KRU: 55555554
Call Trace:
<IRQ>
? show_trace_log_lvl+0x1b0/0x2f0
? show_trace_log_lvl+0x1b0/0x2f0
? __watchdog_ping+0x4d/0x120
? watchdog_timer_fn.cold+0x3d/0xa0
? __pfx_watchdog_timer_fn+0x10/0x10
? __hrtimer_run_queues+0x139/0x2a0
? hrtimer_interrupt+0xfa/0x210
? __sysvec_apic_timer_interrupt+0x52/0x100
? sysvec_apic_timer_interrupt+0x6c/0x90
</IRQ>
<TASK>
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? __pfx_watchdog_ping_work+0x10/0x10
? iTCO_wdt_ping+0x3b/0x80 [iTCO_wdt]
__watchdog_ping+0x4d/0x120
watchdog_ping_work+0x44/0x50
kthread_worker_fn+0x9e/0x220
? __pfx_kthread_worker_fn+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
rcu: 3-...!: (15 ticks this GP) idle=812c/1/0x4000000000000000 softirq=3950/3950 fqs=2424
rcu: (t=70409 jiffies g=3293 q=6 ncpus=4)
rcu: rcu_preempt kthread starved for 58289 jiffies! g3293 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=2
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:0 pid:19 tgid:19 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x259/0x640
? __pfx_rcu_gp_kthread+0x10/0x10
schedule+0x27/0xa0
schedule_timeout+0x9e/0x160
? __pfx_process_timeout+0x10/0x10
rcu_gp_fqs_loop+0x110/0x6d0
XFS (sda2): Starting recovery (logdev: internal)
rcu_gp_kthread+0x1d5/0x270
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 3 to CPUs 2:
NMI backtrace for cpu 2
revious printk message **
CPU: 2 UID: 0 PID: 35 Comm: kworker/2:0 Tainted: G L ------- — 6.12.0-31.el10.x86_64 #1
Tainted: [L]=SOFTLOCKUP
Hardware name: Red Hat KVM/RHEL, BIOS edk2-20240524-12.el10 05/24/2024
Workqueue: events drm_fb_helper_damage_work
RIP: 0010:memcpy_toio+0x71/0xb0
Code: 5c c3 cc cc cc cc 48 85 db 74 f2 40 f6 c5 01 75 45 48 83 fb 01 76 06 40 f6 c5 02 75 25 48 89 d9 48 89 ef 4c 89 e6 48 c1 e9 02 <f3> a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 5b 5d 41 5c c3 cc cc
RSP: 0018:ffffafe080147d70 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 0000000000001400 RCX: 0000000000000270
RDX: 0000000000001400 RSI: ffffafe080886240 RDI: ffffafe082575240
RBP: ffffafe082574800 R08: ffff997c002e1140 R09: 0000000000000007
R10: 0000000000000007 R11: 0000000000000000 R12: ffffafe080885800
R13: 000000000000012a R14: ffffafe080885800 R15: ffff997c13c9af00
FS: 0000000000000000(0000) GS:ffff997c7bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f57fb08bbe0 CR3: 000000010d25e001 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<NMI>
? show_trace_log_lvl+0x1b0/0x2f0
? show_trace_log_lvl+0x1b0/0x2f0
? drm_fbdev_ttm_damage_blit_real+0xcd/0x170 [drm_ttm_helper]
? nmi_cpu_backtrace.cold+0x32/0x63
? nmi_cpu_backtrace_handler+0x11/0x20
? nmi_handle+0x5e/0x120
? default_do_nmi+0x40/0x130
? exc_nmi+0x103/0x180
? end_repeat_nmi+0xf/0x53
? memcpy_toio+0x71/0xb0
? memcpy_toio+0x71/0xb0
? memcpy_toio+0x71/0xb0
</NMI>
<TASK>
drm_fbdev_ttm_damage_blit_real+0xcd/0x170 [drm_ttm_helper]
drm_fbdev_ttm_helper_fb_dirty+0x129/0x1c0 [drm_ttm_helper]
drm_fb_helper_damage_work+0x91/0x170
process_one_work+0x174/0x330
worker_thread+0x252/0x390
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
CPU: 3 UID: 0 PID: 96 Comm: kworker/3:2 Tainted: G L ------- — 6.12.0-31.el10.x86_64 #1
Tainted: [L]=SOFTLOCKUP
Hardware name: Red Hat KVM/RHEL, BIOS edk2-20240524-12.el10 05/24/2024
Workqueue: pm pm_runtime_work
RIP: 0010:_raw_spin_unlock_irqrestore+0x1f/0x30
Code: 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 e8 82 07 00 00 90 f7 c6 00 02 00 00 74 01 fb 65 ff 0d c9 a9 df 67 <74> 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc 90 90 90 90 90
RSP: 0018:ffffafe080427ad0 EFLAGS: 00000286
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000297 RDI: ffff997c08f8c294
RBP: ffff997c00e285e0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffff997c00ff93c0 R12: ffff997c08f8cc10
R13: ffff997c08f8c000 R14: ffff997c08f8c294 R15: ffff997c08f8c000
FS: 0000000000000000(0000) GS:ffff997c7bd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056079b25e088 CR3: 000000010d1b2001 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? show_trace_log_lvl+0x1b0/0x2f0
? show_trace_log_lvl+0x1b0/0x2f0
? xhci_hub_control+0x974/0xf70
? rcu_dump_cpu_stacks+0xac/0xf0
? print_cpu_stall.cold+0x36/0xd8
? check_cpu_stall+0x116/0x220
? rcu_pending+0x2f/0x220
? _raw_spin_unlock_irqrestore+0xe/0x30
? timekeeping_advance+0x372/0x580
? rcu_sched_clock_irq+0xa8/0x370
? update_process_times+0x70/0xb0
? tick_nohz_handler+0x8f/0x140
? __pfx_tick_nohz_handler+0x10/0x10
? __hrtimer_run_queues+0x139/0x2a0
? hrtimer_interrupt+0xfa/0x210
? __sysvec_apic_timer_interrupt+0x52/0x100
? sysvec_apic_timer_interrupt+0x6c/0x90
</IRQ>
<TASK>
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? _raw_spin_unlock_irqrestore+0x1f/0x30
xhci_hub_control+0x974/0xf70
rh_call_control+0x28a/0x9d0
usb_hcd_submit_urb+0xe1/0x2c0
usb_start_wait_urb+0x8a/0x190
usb_control_msg+0xed/0x150
usb_port_suspend+0x23f/0x480
usb_generic_driver_suspend+0x2c/0x50
usb_suspend_both+0x190/0x250
usb_runtime_suspend+0x2e/0x70
? __pfx_usb_runtime_suspend+0x10/0x10
__rpm_callback+0x41/0x170
? __pfx_usb_runtime_suspend+0x10/0x10
rpm_callback+0x55/0x60
? __pfx_usb_runtime_suspend+0x10/0x10
rpm_suspend+0xe6/0x5f0
? _raw_spin_unlock+0xe/0x30
? finish_task_switch.isra.0+0x99/0x2b0
pm_runtime_work+0x98/0xb0
process_one_work+0x174/0x330
worker_thread+0x252/0x390
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
[[K[[[0;1;31m*[[0m[[0;31m* ^[[0m] (1 of 3) Job systemd-fsck@dev-disk-…vice/start running (3s / no limit)
1m*[[0;1;31m*[[0m^[[0;31m* ^[[0m] (1 of 3) Job systemd-fsck@dev-disk-…tart running (1min 14s / no limit)
2m OK [[0m] Finished ^[[0;1;39msystemd-fsck@dev-disk-by\…[[0meck on /dev/disk/by-uuid/285B-EBFD.
;33mwatchdog: BUG: soft lockup - CPU#2 stuck for 57s! [watchdogd:69]^[[0m
rcu: INFO: rcu_preempt self-detected stall on CPU^[[0m
clocksource: Long readout interval, skipping watchdog check: cs_nsec: 70600608128 wd_nsec: 70600606676^[[0m
Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm iTCO_wdt iTCO_vendor_support rapl i2c_i801 i2c_smbus lpc_ich sg joydev pcspkr loop fuse dm_multipath nfnetlink xfs sd_mod ahci libahci virtio_net crct10dif_pclmul crc32_pclmul crc32c_intel libata net_failover ghash_clmulni_intel failover virtio_scsi nvme_tcp bochs nvme_fabrics drm_vram_helper nvme_keyring drm_ttm_helper ttm nvme_core nvme_auth serio_raw dm_mirror dm_region_hash dm_log dm_mod be2iscsi iscsi_boot_sysfs cxgb4i cxgb4 tls libcxgbi libcxgb iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi^[[0m
CPU: 2 UID: 0 PID: 69 Comm: watchdogd Not tainted 6.12.0-31.el10.x86_64 #1^[[0m
Hardware name: Red Hat KVM/RHEL, BIOS edk2-20240524-12.el10 05/24/2024^[[0m
RIP: 0010:iTCO_wdt_ping+0x3b/0x80 [iTCO_wdt]^[[0m
Code: 00 48 8d ab e0 00 00 00 48 89 ef e8 2f 8a 64 d7 83 bb c0 00 00 00 01 76 22 48 8b 83 c8 00 00 00 48 8b 10 b8 01 00 00 00 66 ef <48> 89 ef e8 0d 8b 64 d7 31 c0 5b 5d c3 cc cc cc cc 75 ed 48 8b 83^[[0m
RSP: 0018:ffffafe080257e60 EFLAGS: 00000202^[[0m
RAX: 0000000000000001 RBX: ffff997c0a0ffc28 RCX: 0000000000179423^[[0m
RDX: 0000000000000660 RSI: ffffffff9a803080 RDI: ffff997c0a0ffd08^[[0m
RBP: ffff997c0a0ffd08 R08: 0000000000000001 R09: 0000000000000000^[[0m
R10: 0000000000000054 R11: 0000000000000000 R12: ffff997c391f1000^[[0m
R13: ffff997c00839e04 R14: 0000000000000001 R15: ffffffff97e63fd0^[[0m
FS: 0000000000000000(0000) GS:ffff997c7bd00000(0000) knlGS:0000000000000000^[[0m
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^[[0m
CR2: 000055fc67efcd75 CR3: 000000010cd98003 CR4: 00000000007706f0^[[0m
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^[[0m
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^[[0m
PKRU: 55555554^[[0m
Call Trace:^[[0m
<IRQ>^[[0m
? show_trace_log_lvl+0x1b0/0x2f0^[[0m
? show_trace_log_lvl+0x1b0/0x2f0^[[0m
? __watchdog_ping+0x4d/0x120^[[0m
? watchdog_timer_fn.cold+0x3d/0xa0^[[0m
? __pfx_watchdog_timer_fn+0x10/0x10^[[0m
? __hrtimer_run_queues+0x139/0x2a0^[[0m
? hrtimer_interrupt+0xfa/0x210^[[0m
? __sysvec_apic_timer_interrupt+0x52/0x100^[[0m
? sysvec_apic_timer_interrupt+0x6c/0x90^[[0m
</IRQ>^[[0m
<TASK>^[[0m
? asm_sysvec_apic_timer_interrupt+0x1a/0x20^[[0m
? __pfx_watchdog_ping_work+0x10/0x10^[[0m
60] XFS (sda2): Ending recovery (logdev: internal)
? iTCO_wdt_ping+0x3b/0x80 [iTCO_wdt]^[[0m
__watchdog_ping+0x4d/0x120^[[0m
watchdog_ping_work+0x44/0x50^[[0m
kthread_worker_fn+0x9e/0x220^[[0m
OK [[0m] Mounted ^[[0;1;39mboot.mount[[0m - /boot.
? __pfx_kthread_worker_fn+0x10/0x10^[[0m
unting [[0;1;39mboot-efi.mount[[0m - /boot/efi...
kthread+0xcf/0x100^[[0m
? __pfx_kthread+0x10/0x10^[[0m
ret_from_fork+0x31/0x50^[[0m
? __pfx_kthread+0x10/0x10^[[0m
OK [[0m] Started ^[[0;1;39mlvm-activate-rhel_bootp-73…[[0mvation event rhel_bootp-73-226-159.
ret_from_fork_asm+0x1a/0x30^[[0m
</TASK>^[[0m
rcu: 3-...!: (15 ticks this GP) idle=812c/1/0x4000000000000000 softirq=3950/3950 fqs=2424^[[0m
rcu: (t=70409 jiffies g=3293 q=6 ncpus=4)^[[0m
rcu: rcu_preempt kthread starved for 58289 jiffies! g3293 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=2^[[0m
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.^[[0m
rcu: RCU grace-period kthread stack dump:^[[0m
task:rcu_preempt state:R running task stack:0 pid:19 tgid:19 ppid:2 flags:0x00004000^[[0m
Call Trace:^[[0m
<TASK>^[[0m
__schedule+0x259/0x640^[[0m
? __pfx_rcu_gp_kthread+0x10/0x10^[[0m
schedule+0x27/0xa0^[[0m
schedule_timeout+0x9e/0x160^[[0m
OK [[0m] Mounted ^[[0;1;39mboot-efi.mount[[0m - /boot/efi.
OK [[0m] Reached target ^[[0;1;39mlocal-fs.target[[0m - Local File Systems.