Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-60505

[arm64] CPU hotplug fails and creates stuck migration job

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • CNV v4.19.1
    • CNV v4.19.0
    • CNV Virt-Cluster
    • None
    • CNV Virt-Cluster Sprint 271, CNV Virt-Cluster Sprint 272, CNV Virt-Cluster Sprint 273
    • Important
    • None

      Description of problem:

      On arm64 cluster trying to hotplug additional cpu sockets, causes hotplug failure and stuck migration job (stuck in running forever)

      Version-Release number of selected component (if applicable):

      4.19

      How reproducible:

      100%

      Steps to Reproduce:

      1.create and start vm with sockets=1 and maxSockets=4 in cpu spec
      2.edit vm cpu spec with sockets=2
      3.wait for migration to happen
      

      Actual results:

      hotplug fails, migration job stuck

      Expected results:

      1.hotplug succeeds
      2.if hotplug fails, proper error should be raised and migration job should be failed

      Additional info:
      warning (not error??) log in VMI during migration

      Normal   Migrating                                                                                                                                                                                                               2m15s                  virt-handler                 VirtualMachineInstance is migrating.
        Normal   PreparingTarget                                                                                                                                                                                                         2m14s (x2 over 2m15s)  virt-handler                 VirtualMachineInstance Migration Target Prepared.
        Warning  server error. command SyncVirtualMachineCPUs failed: "failed to update vCPUs: virError(Code=8, Domain=10, Message='invalid argument: requested vcpus is greater than max allowable vcpus for the live domain: 2 > 1')"  2m12s                  virt-handler                 failed to change vCPUs
        Normal   Migrated                                                                                                                                                                                                                2m12s                  virt-handler                 The VirtualMachineInstance migrated to node ip-10-0-59-129.us-east-2.compute.internal.
        Normal   Deleted                                                                                                                                                                                                                 2m12s                  virt-handler                 Signaled Deletion

      even though hotplug fails, VM is still moved to destination node and source pod is Completed and migration job is stuck in Running:

       

      $ oc -n cluster-aaq-test-arq get pods
      NAME                                                            READY   STATUS      RESTARTS   AGE
      virt-launcher-hotplug-vm-for-aaq-test-1745417004-344818-fqj4g   3/3     Running     0          105s
      virt-launcher-hotplug-vm-for-aaq-test-1745417004-344818-w4lwn   0/3     Completed   0          45m
      $ oc -n cluster-aaq-test-arq get vmim
      NAME                             PHASE     VMI
      kubevirt-workload-update-9tlsm   Running   hotplug-vm-for-aaq-test-1745417004-344818 

      in dumpxml output only 1 vcpu listed (meaning maxSockets value from VM spec is ignored):

      <vcpu placement='static'>1</vcpu>

              lpivarc Luboslav Pivarc
              vsibirsk Vasiliy Sibirskiy
              Sibo Wang Sibo Wang
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: