Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-2287

Canceling a single VM can cause the Plan controller to loop forever

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • 2.8.1
    • 2.7.0, 2.8.0
    • Controller
    • Incidents & Support
    • 5
    • False
    • Hide

      None

      Show
      None
    • True

      Description of problem:

      Now that the migration runner attempts to schedule as many VMs as it can in a single reconciliation (https://issues.redhat.com/browse/MTV-1774), it has exposed a bug in the scheduler where canceled VMs are rescheduled endlessly in certain pathological cases, preventing the reconciliation from terminating.
      
      For instance, if a VM is canceled before it has been 
      marked started, it will always appear ready to schedule although there 
      is nothing for the plan controller to do with it.
      
      The endless rescheduling causes the controller to infinite loop and be unable to reconcile resources, necessitating that the offending Migration resource be deleted and the controller pod be restarted.
      
      Checking for the Canceled condition on the VM is adequate to prevent the problem. (Requesting cancellation of the VM via the Migration resource will then cause the VM to be marked with the Canceled condition in memory as the Plan is reconciled, which will cause it to be ignored by the scheduler, and on the following reconcile it will be cleaned up as intended.)
      
      This needs to be fixed in the scheduler for each of the providers. 

       

      Version-Release number of selected component (if applicable):

      2.7

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a Plan with a single VM.
      2. Create a Migration resource with the ID of the VM in the "cancel" field.
      3. Observe that the status of the Migration and Plan do not change, but the logs of the controller spin out of control as it endlessly recycles the canceled VM.

       

      Actual results:

      The forklift-controller spins forever and resources cease to be reconciled, breaking any migrations that may already be in progress.

      Expected results:

      A canceled VM should be marked with the Canceled condition and no longer scheduled, and its resources should be cleaned up on a subsequent reconcile. Once the last VM is complete or canceled, the Plan status should reflect that and the controller should quiesce.

      Additional info:

       

              slucidi@redhat.com Samuel Lucidi
              slucidi@redhat.com Samuel Lucidi
              Chenli Hu Chenli Hu
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: