Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-75233

Failed migrations are flooding the cluster with virt-launchers with Error state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • CNV v4.23.0
    • None
    • CNV Virt-Node
    • None
    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Description of problem:

      If the cluster has came into a state in which live migrations that have been triggered by kubevirt-workload-update are constantly failing, it produce a situation in which a target virt-launcher pod is created every 5 minutes for every VM in the cluster.
      This results in thousands of virt-launcher pods with "Error" state lying around the cluster, overloading etcd and might cause the cluster to become less responsive in short time.

      Version-Release number of selected component (if applicable):

      all versions

      How reproducible:

      if VMIMs are failing, 100%

      Steps to Reproduce:

      1. reproduce bug https://issues.redhat.com/browse/RHEL-131697 (for example)
      2. observe that a new virt-launcher target pod is created every 5 minutes and then got failed, over and over again for every VM in the cluster.
      3.
      

      Actual results:

      thousands of Errored virt-launcher pods reside on the cluster in 1 day

      Expected results:

      there should be an exponential backoff mechanism in such case.

      Additional info:

      If there is a permanent issue, retries shouldn't be executed in constant intervals of 5 minutes.
      Instead the next retry should be twice as long as the previous one.
      Until an upper limit of, for example, 4 hours.

              sgott@redhat.com Stuart Gott
              ocohen@redhat.com Oren Cohen
              Dan Kenigsberg
              Denys Shchedrivyi Denys Shchedrivyi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: