Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-40762

CPU Hotplug can't be canceled when migration fails repeatedly in a loop.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • CNV v4.18.0
    • None
    • CNV Virtualization
    • None
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • ---
    • ---
    • No

      Description of problem:

      If the migration failed for any reason (for example when target POD is non-schedulable and stuck in Pending state) - the new migration initiated automatically. This process repeats indefinitely in a loop.
      
      There are no options to cancel CPU hotplug process. When trying to revert cpu values in VM back - getting this error:
      
       * spec.template.spec.domain.cpu.sockets: cannot update CPU sockets while another CPU change is in progress
      
      
      The only workaround found - restart the VM, however it sounds not correct (with hotplug we don't want to restart the VM)
      
      
      

       

      Version-Release number of selected component (if applicable):

      4.15

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create and run VM with node selector
      2. increase cpu on VM
      VMIM object created automatically but the new pod (target pod) in Pending state because no any place where it can run
      When the target pod in Pending state for 5 minutes - VMIM marked as failed and new VMIM created with the same result - again the target POD in Pending state 

      Actual results:

      There are no options to cancel Hotplug process or revert changes

      Expected results:

      Hotplug process can be canceled manually or probably automatically (after some timeout or when migration failed)
      
      
      

      Additional info:

      Migration can be blocked by multiple reasons like nodeSelector or lack of resourses on other nodes..
      
      With implementing ApplicationAwareQuota the problem may get worse because the target pod can also be blocked by reaching the quota.
      
      Potentially it may lead to blocking the cluster upgrade because the VM can't be evicted from the node.

       

              sgott@redhat.com Stuart Gott
              dshchedr@redhat.com Denys Shchedrivyi
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: