-
Bug
-
Resolution: Done
-
Undefined
-
None
-
rhel-10.1
-
None
-
None
-
rhel-virt-core-libvirt-1
-
ssg_virtualization
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
aarch64
-
None
What were you trying to do that didn't work?
libvirtd/virtqemud crashed on target host during resume postcopy migration
What is the impact of this issue to you?
Causes migration.pause_postcopy_migration_and_recover.pause_and_io_error_and_recover test to fail.
Please provide the package NVR for which the bug is seen:
RHEL-10.1-20250507.1
libvirt-11.2.0-1.el10.aarch64
qemu-kvm-10.0.0-1.el10.aarch64
How reproducible is this bug?:
100%
Steps to reproduce
- Complete basic setup for migration between RHEL 10.1 machines with nfs storage and ssh
- Define vm (avocado-vt-vm1)
- # virsh migrate avocado-vt-vm1 --desturi qemu+ssh://10.6.12.65/system --live --verbose --timeout 10 --timeout-postcopy --postcopy --bandwidth 15 --postcopy-bandwidth 15
- In another terminal, abort postcopy migration
- virsh domjobabort avocado-vt-vm1 --postcopy
- Will get following error in the first terminal:
- Migration: [25.73 %]error: operation failed: job 'migration in' failed in post-copy phase
- Resume postcopy migration.
- # virsh migrate avocado-vt-vm1 --desturi qemu+ssh://10.6.12.65/system --live --verbose --timeout 10 --timeout-postcopy --postcopy --bandwidth 15 --postcopy-bandwidth 15 --postcopy-resume
error: End of file while reading data: Warning: Permanently added '10.6.12.65' (ED25519) to the list of known hosts.
virt-ssh-helper: could not proxy traffic: End of file while reading data: Input/output error: Input/output error
- # virsh migrate avocado-vt-vm1 --desturi qemu+ssh://10.6.12.65/system --live --verbose --timeout 10 --timeout-postcopy --postcopy --bandwidth 15 --postcopy-bandwidth 15 --postcopy-resume
- Check status of vm
- On source: 2 avocado-vt-vm1 paused
- On target: 4 avocado-vt-vm1 running
- Check the virtqemud coredump file on target host
- # coredumpctl
TIME PID UID GID SIG COREFILE EXE SIZE
Thu 2025-05-08 10:59:28 EDT 43463 0 0 SIGSEGV present /usr/sbin/virtqemud 3M
Thu 2025-05-08 12:19:46 EDT 53571 0 0 SIGSEGV present /usr/sbin/virtqemud 3M
Thu 2025-05-08 12:51:38 EDT 75562 0 0 SIGSEGV present /usr/sbin/virtqemud 2.9M
Thu 2025-05-08 19:39:16 EDT 90156 0 0 SIGSEGV present /usr/sbin/libvirtd 5.2M - # coredumpctl dump 90156
PID: 90156 (libvirtd)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Thu 2025-05-08 19:39:16 EDT (24min ago)
Command Line: /usr/sbin/libvirtd --timeout 120
Executable: /usr/sbin/libvirtd
Control Group: /system.slice/libvirtd.service
Unit: libvirtd.service
Slice: system.slice
Boot ID: b86ae612ceab4dbeb0f789a46dd81115
Machine ID: 3bc95fe2e2cd4e22bca2d385b320d97a
Hostname: nvidia-grace-hopper-03.khw.eng.rdu2.dc.redhat.com
Storage: /var/lib/systemd/coredump/core.libvirtd.0.b86ae612ceab4dbeb0f789a46dd81115.90156.1746747556000000.zst (present)
Size on Disk: 5.2M
Message: Process 90156 (libvirtd) of user 0 dumped core. - [note: the /usr/sbin/virtqemud coredump files were from getting this same input/output error when running the libvirt test case. However, following the manual steps above, can only get /usr/sbin/libvirtd coredump.]
- # coredumpctl
- coredumpctl dump 75562
PID: 75562 (virtqemud)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Thu 2025-05-08 12:51:38 EDT (7h ago)
Command Line: /usr/sbin/virtqemud --timeout 120
Executable: /usr/sbin/virtqemud
Control Group: /system.slice/virtqemud.service
Unit: virtqemud.service
Slice: system.slice
Boot ID: b86ae612ceab4dbeb0f789a46dd81115
Machine ID: 3bc95fe2e2cd4e22bca2d385b320d97a
Hostname: nvidia-grace-hopper-03.khw.eng.rdu2.dc.redhat.com
Storage: /var/lib/systemd/coredump/core.virtqemud.0.b86ae612ceab4dbeb0f789a46dd81115.75562.1746723098000000.zst (present)
Size on Disk: 2.9M
Message: Process 75562 (virtqemud) of user 0 dumped core.
Module [dso] from rpm libssh-0.11.1-1.el10.aarch64
Module libcap.so.2 from rpm libcap-2.69-7.el10.aarch64
Module libnss_systemd.so.2 from rpm systemd-257-11.el10.aarch64
Module libnbd.so.0 from rpm libnbd-1.22.2-1.el10.aarch64
Module libvirt_driver_qemu.so from rpm libvirt-11.3.0-1.el10.aarch64
Module libbrotlicommon.so.1 from rpm brotli-1.1.0-6.el10.aarch64
Module libevent-2.1.so.7 from rpm libevent-2.1.12-16.el10.aarch64
Module libkeyutils.so.1 from rpm keyutils-1.6.3-5.el10.aarch64
Module libkrb5support.so.0 from rpm krb5-1.21.3-7.el10.aarch64
Module libblkid.so.1 from rpm util-linux-2.40.2-10.el10.aarch64
Module libbrotlidec.so.1 from rpm brotli-1.1.0-6.el10.aarch64
Module libssl.so.3 from rpm openssl-3.5.0-2.el10.aarch64
Module libpsl.so.5 from rpm libpsl-0.21.5-6.el10.aarch64
Module libnghttp2.so.14 from rpm nghttp2-1.64.0-2.el10.aarch64
Module libcrypt.so.2 from rpm libxcrypt-4.4.36-10.el10.aarch64
Module libcrypto.so.3 from rpm openssl-3.5.0-2.el10.aarch64
Module libtasn1.so.6 from rpm libtasn1-4.20.0-1.el10.aarch64
Module libunistring.so.5 from rpm libunistring-1.1-10.el10.aarch64
Module libidn2.so.0 from rpm libidn2-2.3.7-3.el10.aarch64
Module libp11-kit.so.0 from rpm p11-kit-0.25.5-7.el10.aarch64
Module libattr.so.1 from rpm attr-2.5.2-5.el10.aarch64
Module liblzma.so.5 from rpm xz-5.6.2-3.el10.aarch64
Module libcom_err.so.2 from rpm e2fsprogs-1.47.1-3.el10.aarch64
Module libk5crypto.so.3 from rpm krb5-1.21.3-7.el10.aarch64
Module libkrb5.so.3 from rpm krb5-1.21.3-7.el10.aarch64
Module libgssapi_krb5.so.2 from rpm krb5-1.21.3-7.el10.aarch64
Module libmount.so.1 from rpm util-linux-2.40.2-10.el10.aarch64
Module libz.so.1 from rpm zlib-ng-2.2.3-2.el10.aarch64
Module libgmodule-2.0.so.0 from rpm glib2-2.80.4-4.el10.aarch64
Module libffi.so.8 from rpm libffi-3.4.4-9.el10.aarch64
Module libpcre2-8.so.0 from rpm pcre2-10.44-1.el10.3.aarch64
Module libcurl.so.4 from rpm curl-8.12.1-2.el10.aarch64
Module libsasl2.so.3 from rpm cyrus-sasl-2.1.28-27.el10.aarch64
Module libssh.so.4 from rpm libssh-0.11.1-1.el10.aarch64
Module libselinux.so.1 from rpm libselinux-3.8-1.el10.aarch64
Module libnuma.so.1 from rpm numactl-2.0.19-1.el10.aarch64
Module libnl-3.so.200 from rpm libnl3-3.11.0-1.el10.aarch64
Module libjson-c.so.5 from rpm json-c-0.18-3.el10.aarch64
Module libgnutls.so.30 from rpm gnutls-3.8.9-14.el10.aarch64
Module libcap-ng.so.0 from rpm libcap-ng-0.8.4-6.el10.aarch64
Module libaudit.so.1 from rpm audit-4.0.3-4.el10.aarch64
Module libacl.so.1 from rpm acl-2.3.2-4.el10.aarch64
Module libxml2.so.2 from rpm libxml2-2.12.5-5.el10_0.aarch64
Module libtirpc.so.3 from rpm libtirpc-1.3.5-1.el10.aarch64
Module libgio-2.0.so.0 from rpm glib2-2.80.4-4.el10.aarch64
Module libgobject-2.0.so.0 from rpm glib2-2.80.4-4.el10.aarch64
Module libglib-2.0.so.0 from rpm glib2-2.80.4-4.el10.aarch64
Module libvirt-qemu.so.0 from rpm libvirt-11.3.0-1.el10.aarch64
Module libvirt-lxc.so.0 from rpm libvirt-11.3.0-1.el10.aarch64
Module libvirt.so.0 from rpm libvirt-11.3.0-1.el10.aarch64
Stack trace of thread 75566:
#0 0x0000ffff305c8964 qemuProcessIncomingDefNew (libvirt_driver_qemu.so + 0x138964)
#1 0x0000ffff3057ea14 qemuMigrationDstPrepare (libvirt_driver_qemu.so + 0xeea14)
#2 0x0000ffff3057fb54 qemuMigrationDstPrepareAny (libvirt_driver_qemu.so + 0xefb54)
#3 0x0000ffff305811b4 qemuMigrationDstPrepareDirect (libvirt_driver_qemu.so + 0xf11b4)
#4 0x0000ffff30552870 qemuDomainMigratePrepare3Params.lto_priv.0 (libvirt_driver_qemu.so + 0xc2870)
#5 0x0000ffff355b2ca8 virDomainMigratePrepare3Params (libvirt.so.0 + 0x302ca8)
#6 0x0000aaada8f5e14c n/a (n/a + 0x0)
#7 0x0000aaada8f5e14c n/a (n/a + 0x0)
#8 0x0057ffff3549d1b4 n/a (n/a + 0x0)
#9 0x007cffff3549d718 n/a (n/a + 0x0)
#10 0x006effff3549d858 n/a (n/a + 0x0)
#11 0x000dffff353cd9cc n/a (n/a + 0x0)
#12 0x0021ffff353cc270 n/a (n/a + 0x0)
#13 0x004cffff34c79ea8 n/a (n/a + 0x0)
#14 0x006dffff34ce360c n/a (n/a + 0x0)
ELF object binary architecture: AARCH64
Expected results
Postcopy migration can be completed successfully
Actual results
libvirtd/virtqemud crashed on target host during resume postcopy migration
Note: this is a clone of RHEL-90160, which is for x86_64. However, it is important to note that, despite both architectures experiencing the same issue, the migration.pause_postcopy_migration_and_recover.pause_and_io_error_and_recover test does not fail on x86_64 (only fails on aarch64).
- duplicates
-
RHEL-90160 virtqemud crashed on target host during resume postcopy migration
-
- Release Pending
-