Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-114359

[postcopy] qemu-kvm crashed (Assertion `f == 0' failed) on dest host when forward migration on Gracehopper

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • No
    • Critical
    • 1
    • rhel-virt-hwe-arm-1
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Split items
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • aarch64
    • None

      What were you trying to do that didn't work?

      Migrate vm from 10.0 to 10.1 on GH with postcopy

      Please provide the package NVR for which the bug is seen:

      Source:
      qemu-kvm-9.1.0-15.el10_0.4.aarch64
      edk2-aarch64-20241117-2.el10_0.1.noarch
      6.12.0-55.33.1.el10_0.aarch64+64k
      Dest:
      qemu-kvm-10.0.0-13.el10_1.aarch64
      edk2-aarch64-20250523-2.el10.noarch
      6.12.0-124.el10.aarch64+64k

      How reproducible is this bug?:

      Steps to reproduce

      1. Setup migration env
      2. Run migration command

        virsh migrate --live --p2p --persistent --undefinesource --bandwidth 1000 --postcopy --xml /var/tmp/xml_utils_temp_ibehzvxi.xml --persistent-xml /var/tmp/xml_utils_temp_fi27x1wd.xml --domain avocado-vt-vm1 --desturi qemu+tcp://<dest_ip>/system

      1. Expected results

        Migration is successful

        Actual results

        qemu-kvm crashed on dest host.
        Qemu log on source:

        2025-09-12 01:59:24.913+0000: initiating migration
        2025-09-12T01:59:28.539780Z qemu-kvm: failed to save SaveStateEntry with id(name): 3(ram): -5
        2025-09-12T01:59:28.539888Z qemu-kvm: Unable to shutdown socket: Transport endpoint is not connected
        2025-09-12T01:59:28.539923Z qemu-kvm: Unable to shutdown socket: Bad file descriptor
        2025-09-12T01:59:28.539947Z qemu-kvm: Detected IO failure for postcopy. Migration paused.
        2025-09-12T02:00:14.206350Z qemu-kvm: terminating on signal 15 from pid 258716 (/usr/sbin/virtqemud)
        2025-09-12 02:00:15.006+0000: shutting down, reason=destroyed

        Qemu log on target:

        2025-09-12 01:59:24.387+0000: 220528: info : hostname: nvidia-grace-hopper-03.khw.eng.rdu2.dc.redhat.com
        2025-09-12 01:59:24.387+0000: 220528: info : virObjectUnref:378 : OBJECT_UNREF: obj=0xffff28007900
        char device redirected to /dev/pts/7 (label charserial0)
        2025-09-12T01:59:28.539463Z qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
        2025-09-12T01:59:28.539610Z qemu-kvm: error while loading state section id 3(ram)
        qemu-kvm: ../util/oslib-posix.c:247: void qemu_socket_set_nonblock(int): Assertion `f == 0' failed.
        2025-09-12 01:59:28.928+0000: shutting down, reason=failed

      log:
      Running '/bin/virsh -c 'qemu:///system' migrate-postcopy avocado-vt-vm1'
      Command '/bin/virsh -c 'qemu:///system' migrate-postcopy avocado-vt-vm1' finished with 0 after 0.022879207s
      [stderr] error: internal error: QEMU unexpectedly closed the monitor (vm='avocado-vt-vm1'): 2025-09-12T01:59:28.539463Z qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
      [stderr] 2025-09-12T01:59:28.539610Z qemu-kvm: error while loading state section id 3(ram)
      [stderr] qemu-kvm: ../util/oslib-posix.c:247: void qemu_socket_set_nonblock(int): Assertion `f == 0' failed.
      [stdlog] 2025-09-11 21:59:29,543 avocado.utils.process process L0714 INFO | Command '/bin/virsh -c 'qemu:///system' migrate --live --p2p --persistent --undefinesource --bandwidth 1000 --postcopy --xml /var/tmp/xml_utils_temp_ibehzvxi.xml --persistent-xml /var/tmp/xml_utils_temp_fi27x1wd.xml --domain avocado-vt-vm1 --desturi qemu+tcp://10.6.12.65/system' finished with 1 after 5.262243140s

      See attachment for core file.

      Auto case: (1/1) type_specific.io-github-autotest-libvirt.migrate.migration_with_numa_topology.with_back_migration.postcopy.base_options.multi_cluster_on_numa: FAIL: error: internal error: QEMU unexpectedly closed the monitor (vm='avocado-vt-vm1'): 2025-09-12T01:59:28.539463Z qemu-kvm: error while loading state for instance 0x0 of device 'cpu'\n2025-09-12T01:59:28.539610Z qemu-kvm: error while loading state section id ... (250.48 s)

              virt-maint virt-maint
              rhn-support-dzheng Dan Zheng
              virt-maint virt-maint
              virt-bugs virt-bugs
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: