Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-70330

Live migration of VM fails when vNUMA is enabled and target node is not empty

XMLWordPrintable

    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Hide
      * Live migration failure may occur if you are migrating a VM which has vNUMA enabled, and the `topologyManagerPolicy` setting in the KubeletConfig is configured with `none`. This is due to conflicting NUMA cells in the Topology Manager policy. As a consequence, user experience may be affected if a live VM migration fails due to NUMA cell mismatch. To work around this issue and achieve successful live migration, configure the `topologyManagerPolicy` setting in the KubeletConfig to use either the `best-effort` or `single-numa-node` policies.(link:https://issues.redhat.com/browse/CNV-70330[*CNV-70330*])
      Show
      * Live migration failure may occur if you are migrating a VM which has vNUMA enabled, and the `topologyManagerPolicy` setting in the KubeletConfig is configured with `none`. This is due to conflicting NUMA cells in the Topology Manager policy. As a consequence, user experience may be affected if a live VM migration fails due to NUMA cell mismatch. To work around this issue and achieve successful live migration, configure the `topologyManagerPolicy` setting in the KubeletConfig to use either the `best-effort` or `single-numa-node` policies.(link: https://issues.redhat.com/browse/CNV-70330 [* CNV-70330 *])
    • Known Issue
    • Done
    • None

      Description of problem:

      Create 2-3 (depends on the number of nodes) vms with numa passthrough enabled, each VM should be running on separate node and should have number of cores at least between half and full number of cores in a single numa node (nodes has to have enough resources for 2 VMs on the same node). Trigger live migration of VM1, so it is moved to the same node as VM2. Live migration fails with error: virError(Code=27, Domain=20, Message='XML error: Argument 'cellid' in memnode element must correspond to existing guest's NUMA cell')

      Topology manager policy was set to:

        spec:
          kubeletConfig:
            cpuManagerPolicy: static
            topologyManagerPolicy: none
       

      Testing with additional policies is captured here: https://docs.google.com/document/d/169adfy9QFd3YsiyXtzaEA3NE9jSBEoneCYL0mWw2gt8/edit?tab=t.0

      Version-Release number of selected component (if applicable):

      4.20.0

      How reproducible:

      100%

      Steps to Reproduce:

      1.Create the same number of VMs with vNUMA enabled as number of worker nodes. 
      2. Start migration of one VM
      3. Observe failing live migration
      

      Actual results:

      live migration fails due to the libvirt error

      Expected results:

      VM is successfuly migrated.

      Additional info:

       

        1. migration_obj
          5 kB
        2. migration_obj_yaml
          4 kB
        3. virt_launcher_log
          8 kB
        4. virtualmachineinstancetype.yaml
          3 kB
        5. vmis
          27 kB
        6. vmis_yaml
          18 kB
        7. vms
          11 kB
        8. vms_yaml
          10 kB

              ksimon@redhat.com Karel Simon
              ksimon@redhat.com Karel Simon
              Geetika Kapoor Geetika Kapoor
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: