Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-22166

Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact

    • None
    • None
    • ZStream
    • sst_virtualization
    • ssg_virtualization
    • 5
    • QE ack, Dev ack
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Approved Blocker
    • None
    • None
    • 10.7.0
    • None

      What were you trying to do that didn't work?

      Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.

      Please provide the package NVR for which bug is seen:

      libvirt-10.0.0-1.el9.x86_64

      qemu-kvm-8.2.0-2.el9.x86_64

      How reproducible:

      100%

      Steps to reproduce

      1. Monitor events on both source and target host:
          # virsh event --loop --all
      
      2.Do migration:
      
          # virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy
      
      3. Switch to postcopy before migration completes:
      
          # virsh migrate-postcopy vm2
      
      4.Abort postcopy migraiton before migration completes:
      
          # virsh domjobabort vm2 --postcopy
          error: Requested operation is not valid: domain is not running
      
      5.Recover postcopy migraiton:
      
          # virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy --postcopy-resume
          error: operation failed: job 'migration in' failed in post-copy phase
      
      6.Try Recover postcopy again:
      
          # virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy --postcopy-resume
          error: Requested operation is not valid: QEMU reports migration is still running
      
      7.After a few minutes, the guest was running on target host and shutdown on the source guest
      
      8. Check the events on both source and target host:
      
          Source host:
      
          # virsh event --all --loop
          event 'migration-iteration' for domain 'vm2': iteration: '1'
          event 'lifecycle' for domain 'vm2': Suspended Migrated
          event 'lifecycle' for domain 'vm2': Suspended Post-copy
          event 'migration-iteration' for domain 'vm2': iteration: '2'
          event 'lifecycle' for domain 'vm2': Suspended Post-copy Error
          *event 'lifecycle' for domain 'vm2': Suspended Post-copy*
          *event 'lifecycle' for domain 'vm2': Suspended Post-copy Error*
          event 'lifecycle' for domain 'vm2': Stopped Migrated
          event 'job-completed' for domain 'vm2':
              operation: 5
              time_elapsed: 21752
              downtime: 184
              setup_time: 160
              data_total: 2160803840
              data_processed: 45301213
              data_remaining: 2112503808
              memory_total: 2160803840
              memory_processed: 45301213
              memory_remaining: 2112503808
              memory_bps: 5257170
              memory_constant: 3014
              memory_normal: 8832
              memory_normal_bytes: 36175872
              memory_dirty_rate: 7
              memory_iteration: 2
              memory_postcopy_requests: 0
              memory_page_size: 4096
              disk_total: 0
              disk_processed: 0
              disk_remaining: 0
      
          target host:
      
          # virsh event --all --loop
          event 'agent-lifecycle' for domain 'vm2': state: 'disconnected' reason: 'domain started'
          event 'lifecycle' for domain 'vm2': Started Migrated
          event 'agent-lifecycle' for domain 'vm2': state: 'disconnected' reason: 'channel event'
          event 'lifecycle' for domain 'vm2': Defined Updated
          event 'lifecycle' for domain 'vm2': Resumed Post-copy
          event 'agent-lifecycle' for domain 'vm2': state: 'connected' reason: 'channel event'
          event 'lifecycle' for domain 'vm2': Resumed Post-copy Error
          event 'lifecycle' for domain 'vm2': Resumed Post-copy
          event 'lifecycle' for domain 'vm2': Resumed Migrated
      

      Expected results

      Should not report error if recover successfully.

      Actual results

      Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.

      Additional info:

      1.It's a regression issue and can not reproduce with libvirt-9.10.0-1.el9.x86_64.

      2.Error info in libvirtd log:

       # cat /var/log/libvirt/libvirtd.log | grep -i error
      2024-01-19 07:56:08.323+0000: 87228: error : virNetClientProgramDispatchError:170 : operation failed: job 'migration in' failed in post-copy phase
      2024-01-19 07:56:08.335+0000: 87228: debug : virNetServerProgramSendError:147 : prog=536903814 ver=1 proc=305 type=1 serial=11 msg=0x55eac5a72f60 rerr=0x7f487ddfa9a0
      2024-01-19 07:56:14.740+0000: 89328: debug : qemuMonitorJSONIOProcessLine:191 : Line [{"return": {"status": "postcopy-paused", "setup-time": 171, "error-desc": "Postcopy migration is paused by the user", "downtime": 192, "total-time": 10816, "ram": {"total": 2160803840, "postcopy-requests": 260, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 28666, "downtime-bytes": 0, "page-size": 4096, "remaining": 101707776, "postcopy-bytes": 97200224, "mbps": 940.56274285714289, "transferred": 114720833, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 16828994, "duplicate": 475968, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 109531136, "normal": 26741}}, "id": "libvirt-463"}]
      2024-01-19 07:56:14.740+0000: 89328: info : qemuMonitorJSONIOProcessLine:210 : QEMU_MONITOR_RECV_REPLY: mon=0x7f486804ed00 reply=\{"return": {"status": "postcopy-paused", "setup-time": 171, "error-desc": "Postcopy migration is paused by the user", "downtime": 192, "total-time": 10816, "ram": {"total": 2160803840, "postcopy-requests": 260, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 28666, "downtime-bytes": 0, "page-size": 4096, "remaining": 101707776, "postcopy-bytes": 97200224, "mbps": 940.56274285714289, "transferred": 114720833, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 16828994, "duplicate": 475968, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 109531136, "normal": 26741}}, "id": "libvirt-463"}
      2024-01-19 07:56:14.766+0000: 87226: error : virNetClientProgramDispatchError:170 : operation failed: job 'migration in' failed in post-copy phase
      2024-01-19 07:56:14.778+0000: 87226: debug : virNetServerProgramSendError:147 : prog=536903814 ver=1 proc=305 type=1 serial=11 msg=0x55eac5a6d390 rerr=0x7f487edfc9a0
      

            jdenemar@redhat.com Jiri Denemark
            rhn-support-yafu Yan Fu
            Jiri Denemark Jiri Denemark
            Luyao Huang Luyao Huang
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: