Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7096

The migration port is not released if use it again for recovering postcopy immediately

    • Minor
    • sst_virtualization
    • ssg_virtualization
    • 5
    • False
    • Hide

      None

      Show
      None
    • Known Issue
    • Hide
      .Recovering an interrupted post-copy VM migration might fail
      If a post-copy migration of a virtual machine (VM) is interrupted and then immediately resumed on the same incoming port, the migration might fail with the following error: `Address already in use`

      To work around this problem, wait at least 10 seconds before resuming the post-copy migration or switch to another port for migration recovery.
      Show
      .Recovering an interrupted post-copy VM migration might fail If a post-copy migration of a virtual machine (VM) is interrupted and then immediately resumed on the same incoming port, the migration might fail with the following error: `Address already in use` To work around this problem, wait at least 10 seconds before resuming the post-copy migration or switch to another port for migration recovery.
    • Done

      Description of problem:
      The migration port is not released if use it again for recovering postcopy immediately

      Version-Release number of selected component (if applicable):
      hosts info: kernel-5.14.0-284.el9.s390x && qemu-kvm-7.2.0-12.el9_2.s390x
      guest info: kernel-5.14.0-284.el9.s390x

      How reproducible:
      2/3

      Steps to Reproduce:
      1.Boot a guest on src host with qemu commands[1]
      2.Run stressapptest in src guest

      1. stressapptest -M 400 -s 100000 > /dev/null &
        3.Boot a guest on dst host with appending "-incoming defer"
        4.Enable postcopy-ram on src and dst host, and Set max-postcopy-bandwidth to 5M on src host
        {"execute": "migrate-set-capabilities", "arguments":
        Unknown macro: {"capabilities"}

        , "id": "kcR6zlRt"}

        Unknown macro: {"execute"}

        5.Migrate guest to dst host, during migration, change into postcopy mode
        dst qmp:

        Unknown macro: {"execute"}

      src qmp:
      {"execute": "migrate", "arguments":

      {"uri": "tcp:10.0.160.22:4000", "blk": false, "inc": false, "detach": true, "resume": false}

      , "id": "wAeA93G3"}

      {"execute": "query-migrate", "id": "Ux4M0tUE"}

      {"return": {"expected-downtime": 300, "status": "active", "setup-time": 389, "total-time": 401, "ram": {"total": 4294967296, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 4294914048, "postcopy-bytes": 0, "mbps": 0, "transferred": 16495, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 20599, "duplicate": 7, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 16384, "normal": 4}}, "id": "Ux4M0tUE"}

      {"execute": "migrate-start-postcopy", "id": "DeHSiUGi"} {"execute": "query-migrate", "id": "hgC8wXzd"}

      {"return": {"expected-downtime": 300, "status": "postcopy-active", "setup-time": 389, "total-time": 500, "ram": {"total": 4294967296, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 8113, "downtime-bytes": 0, "page-size": 4096, "remaining": 4281475072, "postcopy-bytes": 0, "mbps": 264.52334975369456, "transferred": 13424415, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 13424415, "duplicate": 23, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 13398016, "normal": 3271}}, "id": "hgC8wXzd"}

      dst qmp:

      {"execute": "query-migrate", "id": "1IvIYeLw"}

      {"return": {"status": "postcopy-active", "socket-address": [

      {"port": "4000", "ipv6": true, "host": "::", "type": "inet"}

      ]}, "id": "1IvIYeLw"}

      6.During postcopy phase, pause migration
      src qmp:

      {"execute": "migrate-pause", "id": "ujcerLtX"} {"execute": "query-migrate", "id": "GOOG9F7q"}

      {"return": {"expected-downtime": 300, "status": "postcopy-paused", "setup-time": 389, "total-time": 7839, "ram": {"total": 4294967296, "postcopy-requests": 85, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 1370, "downtime-bytes": 0, "page-size": 4096, "remaining": 4237402112, "postcopy-bytes": 39311230, "mbps": 42.032879999999999, "transferred": 52735645, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 13424415, "duplicate": 1213, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 52621312, "normal": 12847}}, "id": "GOOG9F7q"}

      dst qmp:

      {"execute": "query-migrate", "id": "l9XIQfO0"}

      {"return": {"status": "postcopy-paused", "socket-address": [

      {"port": "4000", "ipv6": true, "host": "::", "type": "inet"}

      ]}, "id": "l9XIQfO0"}

      7.Recover postcopy migration immediately after getting postcopy-paused, use same migration port (4000)
      dst qmp:
      {"exec-oob": "migrate-recover", "arguments":

      {"uri": "tcp:[::]:4000"}

      , "id": "243Y88Wk"}
      {"id": "243Y88Wk", "error": {"class": "GenericError", "desc": "Failed to find an available port: Address already in use"}}

      Actual results:
      As step 7

      Expected results:
      Recovery migration successfully

      Additional info:
      1. If wait 10s after step 6, then usually execute step 7 successfully.

            rh-ee-clegoate Cédric Le Goater
            rhn-support-xiaohli Xiaohui Li
            Cédric Le Goater Cédric Le Goater
            Xiaohui Li Xiaohui Li
            Daniel Vozenilek Daniel Vozenilek
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated: