Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-57547

OpenShift Virtualization - VM does not failover to other node when the nodes is poweroff

    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • ---
    • ---
    • Important
    • None

      Description of problem

      Version-Release number of selected component (if applicable):

      OpenShift Virtualization 4.18 on x86 and s390x

      How reproducible:

          VM is deployed based on ODF. It is able to Live Migrating between nodes. That means, in a non planned situation like Node failure or Network cut the VM running on the failed/cutoff node must be automatically failovered to an other node. It should happen the similar way like a [stateless | statefull] Deployment with ReplicaSet. However, it does not happen. The underlying pod gets into Terminating state on the failed node forever. And the VM stays in status Running on the poweroffed node. 
      Problem: The possible critical workload on the VM becomes unavailable. RTO is not fulfilled.      

      Steps to Reproduce:

          1. OCP 4.18 on x86 or on s390x. ODF is installed. 
          2. Deploy VM based on ODF (openshift-csi-rdb-virtualization as a default storage class)
          3. Poweroff the node, the VM is running on. 
          

       

      Actual results:

          VM is not failovered to the healthy node. Instead, it stays in status Running. The underlying pod gets in status Terminating forever. 

      Expected results:

          Once the poweroffed node has being monitored as NotReady, the VM should failover to a healthy node. It is the basics of the HA business.

      Additional info:

      - When the failed node is powered on and Reday again, then the VM continue to run.  
      - Manually the failed node can be drained and deleted from the cluster. In this case the VM will be started on the other node. 
      BUT: HA requires, it should happen automatically in the seconds / minutes scope of time, without any admin intervention.  

       

              rh-ee-jcanocan Javier Cano Cano
              rh-ee-kkonson Konstantin Konson
              Geetika Kapoor Geetika Kapoor
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: