Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-38291

For vms that can't be livemigrated due to lack of target node with matching resources eviction strategy LiveMigrateIfPossible does not prioritizes upgrades, and blocks upgrades

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • CNV Documentation
    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ---
    • ---
    • No

      Description of problem:

      For vms that can't be livemigrated due to lack of target node with matching resources eviction strategy LiveMigrateIfPossible does not prioritizes upgrades, and blocks upgrades. As per our doc: https://docs.openshift.com/container-platform/4.14/virt/nodes/virt-node-maintenance.html#eviction-strategies
      
      {noformat}
      The default eviction strategy is LiveMigrate. A non-migratable VM with a LiveMigrate eviction strategy might prevent nodes from draining or block an infrastructure upgrade because the VM is not evicted from the node. This situation causes a migration to remain in a Pending or Scheduling state unless you shut down the VM manually.
      
      You must set the eviction strategy of non-migratable VMs to LiveMigrateIfPossible, which does not block an upgrade, or to None, for VMs that should not be migrated.
      {noformat}
      But if we do select LiveMigrateIfPossible for vms that can't be live migrated due to lack of resources, it blocks upgrades forever.
      
      

      Version-Release number of selected component (if applicable):

      4.14.z, 4.15.0
      

      How reproducible:

      100%
      

      Steps to Reproduce:

      1. create VM with bridge and set eviction strategy to LiveMigrateIfPossible
      2. the vm will never be evicted during ocp upgrade as target pods does not have bridge 
      3. upgrade would be blocked forever
      
      

      Actual results:

      we see this for the target virt launcher:
      Warning  FailedScheduling  3m59s  default-scheduler  0/6 nodes are available: 1 node(s) were unschedulable, 2 Insufficient bridge.network.kubevirt.io/upg-br-mark, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 2 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
      

      Expected results:

      As per our docs, using LiveMigrateIfPossible would interrrupt workflow and won't block upgrades, but in this situation, it blocks upgrades.
      

      Additional info:

      
      

            Unassigned Unassigned
            rhn-support-dbasunag Debarati Basu-Nag
            Kedar Bidarkar Kedar Bidarkar
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: