Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-67925

Eviction can last forever for VMs that fails to be migrated

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • CNV v4.21.0
    • CNV v4.19.0
    • CNV Virt-Cluster
    • None
    • Quality / Stability / Reliability
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • CNV Virt-Cluster Sprint 278
    • None

      Description of problem:

      In case of eviction (via eviction API) of a VM that can be live migrated, the admission webhook will accept the eviction request marking the VM for eviction (.status.evacuationNodeName).
      The eviction controller will try to evict it triggering a live migration. If the migration fails, another one will be retried after a certain time and so on.
      If for any reason the VM could not be migrated in a systematically way, the VM will remain in the eviction status forever.
      
      This could block a node drain or external components like the descheduler with a capping on the number of parallel evictions.
      
      

      Version-Release number of selected component (if applicable):

      4.19

      How reproducible:

      100%

      Steps to Reproduce:

      1. force a VM fail a live migration
      2. evict it via eviction API
      3.
      

      Actual results:

      The live migration will be attempted forever

      Expected results:

      The evict request must eventually come to a conclusion (hard stop, pause ???) within a reasonable but certain time frame (or number of attempts)

      Additional info:

      We have also have an in progress PR to let the cluster admin cancel an eviction: https://github.com/kubevirt/kubevirt/pull/14587

       

       

              lpivarc Luboslav Pivarc
              stirabos Simone Tiraboschi
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: