Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-71126

[vfio migration]The actual total downtime is much more than the maxdowntime we set

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • rhel-virt-core
    • ssg_virtualization
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • aarch64
    • None

      What were you trying to do that didn't work?

      The actual total downtime is much more than the maxdowntime we set after vfio migration with multiple VFs.

      Please provide the package NVR for which the bug is seen:

      5.14.0-539.el9.aarch64
      libvirt-10.10.0-1.el9.aarch64
      qemu-kvm-9.1.0-6.el9.aarch64

      How reproducible is this bug?:

      100%

      Steps to reproduce

      1. Start a vm with 4 VFs
        # virsh dumpxml avocado-vt-vm1 --xpath //hostdev
        <hostdev mode="subsystem" type="pci" managed="yes">
          <driver name="vfio"/>
          <source>
            <address domain="0x0000" bus="0x01" slot="0x00" function="0x5"/>
          </source>
          <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
        </hostdev>
        <hostdev mode="subsystem" type="pci" managed="yes">
          <driver name="vfio"/>
          <source>
            <address domain="0x0000" bus="0x01" slot="0x00" function="0x2"/>
          </source>
          <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
        </hostdev>
        <hostdev mode="subsystem" type="pci" managed="yes">
          <driver name="vfio"/>
          <source>
            <address domain="0x0000" bus="0x01" slot="0x00" function="0x3"/>
          </source>
          <address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
        </hostdev>
        <hostdev mode="subsystem" type="pci" managed="yes">
          <driver name="vfio"/>
          <source>
            <address domain="0x0000" bus="0x01" slot="0x00" function="0x4"/>
          </source>
          <address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
        </hostdev>
        
      2. virsh migrate-setmaxdowntime avocado-vt-vm1 280
      3. virsh migrate --live --verbose --domain avocado-vt-vm1 --desturi qemu+tcp://10.26.1.119/system
      4. Check downtime after migration on destination using 'virsh domjobinfo avocado-vt-vm1 --completed'

      Expected results

      The value of 'Total downtime' should be less or around the value we set.

      Actual results

      The actual downtime is 1230 ms, much more bigger than 280ms.

      # virsh domjobinfo avocado-vt-vm1  --completed
      Job type:         Completed   
      Operation:        Incoming migration
      Time elapsed:     356540       ms
      Time elapsed w/o network: 356539       ms
      Data processed:   6.156 GiB
      Data remaining:   0.000 B
      Data total:       8.125 GiB
      Memory processed: 6.156 GiB
      Memory remaining: 0.000 B
      Memory total:     8.125 GiB
      Memory bandwidth: 17.967 MiB/s
      Dirty rate:       0            pages/s
      Page size:        4096         bytes
      Iteration:        223466      
      Postcopy requests: 0           
      Constant pages:   1960235     
      Normal pages:     1145319     
      Normal data:      4.369 GiB
      Total downtime:   1230         ms
      Downtime w/o network: 1229         ms
      Setup time:       147          ms
      
      

      Others

      If vm has 1 VF, total dowmtime will be ~403 ms:

      # virsh domjobinfo avocado-vt-vm1  --completed
      Job type:         Completed   
      Operation:        Incoming migration
      Time elapsed:     12348        ms
      Time elapsed w/o network: 12347        ms
      Data processed:   887.443 MiB
      Data remaining:   0.000 B
      Data total:       8.125 GiB
      Memory processed: 887.443 MiB
      Memory remaining: 0.000 B
      Memory total:     8.125 GiB
      Memory bandwidth: 109.076 MiB/s
      Dirty rate:       0            pages/s
      Page size:        4096         bytes
      Iteration:        3           
      Postcopy requests: 0           
      Constant pages:   1913326     
      Normal pages:     220155      
      Normal data:      859.980 MiB
      Total downtime:   403          ms
      Downtime w/o network: 402          ms
      Setup time:       39           ms
      

      If there're 2 VFs, it'll be 617 ms.

      # virsh domjobinfo avocado-vt-vm1  --completed
      Job type:         Completed   
      Operation:        Incoming migration
      Time elapsed:     12820        ms
      Time elapsed w/o network: 12819        ms
      Data processed:   878.694 MiB
      Data remaining:   0.000 B
      Data total:       8.125 GiB
      Memory processed: 878.694 MiB
      Memory remaining: 0.000 B
      Memory total:     8.125 GiB
      Memory bandwidth: 107.275 MiB/s
      Dirty rate:       0            pages/s
      Page size:        4096         bytes
      Iteration:        3           
      Postcopy requests: 0           
      Constant pages:   1914681     
      Normal pages:     216754      
      Normal data:      846.695 MiB
      Total downtime:   617          ms
      Downtime w/o network: 616          ms
      Setup time:       76           ms
      
      
      

              virt-maint virt-maint
              yicui1 Yingshun Cui
              virt-maint virt-maint
              virt-bugs virt-bugs
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

                Created:
                Updated:
                Resolved: