Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7497

Cancel migration sometimes hang in cancelling status with multifd (or zerocopy) enabled and many multifd channels

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • None
    • Moderate
    • rhel-virt-core
    • ssg_virtualization
    • 5
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • Automated
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:
      Do zerocopy and multifd migration, during migration is active, cancel migration, sometimes it fails to cancel migration, hang in cancelling status

      Version-Release number of selected component (if applicable):
      hosts info: kernel-5.14.0-226.el9.aarch64 && qemu-kvm-7.2.0-2.el9.aarch64
      guest info: kernel-4.18.0-447.el8.aarch64

      How reproducible:
      1/50

      Steps to Reproduce:
      1.Boot a guest on the source host with qemu command lines [1]
      2.Boot the guest on the destination host with same qemu cmd with [1] but append '-incoming defer'
      3.Enable multifd on the src and dst host, enable zero copy on the src host, set multifd channel to 4 on the src and dst host
      4.Set migration incoming on the dst host, start to migration from the src to dst host
      5.During migration is active, cancel migration

      The auto log as below, 10.19.241.172 is the src host, 10.19.241.174 is the dst host:
      2023-01-13-06:00:23: Host(10.19.241.174) Sending qmp command : {"execute": "migrate-incoming", "arguments":

      {"uri": "tcp:[::]:4000"}

      , "id": "qBJcl14d"}
      2023-01-13-06:00:24: Host(10.19.241.174) Responding qmp command: {"return": {}, "id": "qBJcl14d"}
      2023-01-13-06:00:24: Host(10.19.241.172) Sending qmp command : {"execute": "migrate", "arguments":

      {"uri": "tcp:10.19.241.174:4000", "blk": false, "inc": false, "detach": true, "resume": false}

      , "id": "IQA7fhXl"}
      2023-01-13-06:00:24: Host(10.19.241.172) Responding qmp command: {"return": {}, "id": "IQA7fhXl"}
      2023-01-13-06:00:24: Host(10.19.241.172) Sending qmp command :

      {"execute": "query-migrate", "id": "iVo7Bktc"}

      2023-01-13-06:00:24: Host(10.19.241.172) Responding qmp command: {"return":

      {"status": "setup"}

      , "id": "iVo7Bktc"}
      2023-01-13-06:00:29: Host(10.19.241.172) Sending qmp command :

      {"execute": "query-migrate", "id": "NeeVUFkq"}

      2023-01-13-06:00:29: Host(10.19.241.172) Responding qmp command: {"return": {"expected-downtime": 300, "status": "active", "setup-time": 4, "total-time": 5012, "ram": {"total": 4429328384, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 571058432, "pages-per-second": 773219, "downtime-bytes": 0, "page-size": 4096, "remaining": 2806267904, "postcopy-bytes": 0, "mbps": 615.55226016260156, "transferred": 573374799, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 2316367, "duplicate": 257305, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 569139200, "normal": 138950}}, "id": "NeeVUFkq"}
      2023-01-13-06:00:29: Host(10.19.241.174) Sending qmp command :

      {"execute": "query-migrate", "id": "pZnBPkIF"}

      2023-01-13-06:00:29: Host(10.19.241.174) Responding qmp command: {"return": {"status": "active", "socket-address": [

      {"port": "4000", "ipv6": true, "host": "::", "type": "inet"}

      ]}, "id": "pZnBPkIF"}
      2023-01-13-06:00:29: ======= Step 6. During migration, cancel it =======
      2023-01-13-06:00:29: ----- 6.1 Cancel migration during it is active -----
      2023-01-13-06:00:29: Host(10.19.241.172) Sending qmp command :

      {"execute": "migrate_cancel", "id": "Lu4AfMCD"}

      2023-01-13-06:00:35: Host(10.19.241.172) Responding qmp command: {"return": {}, "id": "Lu4AfMCD"}
      2023-01-13-06:00:35: Host(10.19.241.172) Sending qmp command :

      {"execute": "query-migrate", "id": "96lzw0jZ"}

      2023-01-13-06:00:35: Host(10.19.241.172) Responding qmp command: {"return": {"expected-downtime": 300, "status": "cancelling", "setup-time": 4, "total-time": 11031, "ram": {"total": 4429328384, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 571062464, "pages-per-second": 773219, "downtime-bytes": 0, "page-size": 4096, "remaining": 2806267904, "postcopy-bytes": 0, "mbps": 615.55226016260156, "transferred": 573378831, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 2316367, "duplicate": 257305, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 569139200, "normal": 138950}}, "id": "96lzw0jZ"}

      Actual results:
      As the step 5 of Steps to Reproduce, cancel migration hang in cancelling status. Can't cancel migration now

      Expected results:
      Cancel migration successfully.

      Additional info:
      1. Tried 300 times for the plain migration without zerocopy and multifd enabled, cancel migration always successfully;
      2. Tried 100 times with only multifd enabled and set multifd channel to 4, cancel migration also successfully

              virt-maint virt-maint
              rhn-support-xiaohli Xiaohui Li
              Xiaohui Li Xiaohui Li
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

                Created:
                Updated:
                Resolved: