Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-70607

virt-handler (virt-launcher?) is hotlooping detecting the completion of a migration -> node drain is stuck

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • CNV v4.20.0
    • CNV Virt-Node
    • Important
    • None

      Description of problem:

      virt-handler is hot-looping detecting the completion of a migration.
      
      See:
      {"component":"virt-handler","kind":"","level":"info","msg":"The target node detected that the migration has completed","name":"rhel9-2423","namespace":"vm-ns-1","pos":"migration-target.go:213","timestamp":"2025-10-14T15:27:07.546287Z","uid":"19ec4892-c15b-4f90-a331-4f5bdb811259"}
      {"component":"virt-handler","kind":"","level":"info","msg":"The target node detected that the migration has completed","name":"rhel9-2423","namespace":"vm-ns-1","pos":"migration-target.go:213","timestamp":"2025-10-14T15:27:07.562381Z","uid":"19ec4892-c15b-4f90-a331-4f5bdb811259"}
      {"component":"virt-handler","kind":"","level":"info","msg":"The target node detected that the migration has completed","name":"rhel9-2423","namespace":"vm-ns-1","pos":"migration-target.go:213","timestamp":"2025-10-14T15:27:07.619072Z","uid":"19ec4892-c15b-4f90-a331-4f5bdb811259"}
      {"component":"virt-handler","kind":"","level":"info","msg":"The target node detected that the migration has completed","name":"rhel9-2423","namespace":"vm-ns-1","pos":"migration-target.go:213","timestamp":"2025-10-14T15:27:07.634057Z","uid":"19ec4892-c15b-4f90-a331-4f5bdb811259"}
      [root@e44-h32-000-r650 ~]# oc logs -n openshift-cnv virt-handler-fz7cl -c virt-handler| grep "The target node detected that the migration has completed" | grep "rhel9-2423" | wc -l
      1811
      ...
      [root@e44-h32-000-r650 ~]# oc logs -n openshift-cnv virt-handler-fz7cl -c virt-handler| grep "The target node detected that the migration has completed" | grep "rhel9-2423" | wc -l
      2177
      ...
      [root@e44-h32-000-r650 ~]# oc logs -n openshift-cnv virt-handler-fz7cl -c virt-handler| grep "The target node detected that the migration has completed" | grep "rhel9-2423" | wc -l
      3606
      
      node-drain-controller is trying to evict virt-launcher pod on that node as a result of a node drain.
      The admission webhook is correctly setting status.evacuationNodeName on that VMI object, but the same code block in virt-handler that logs "The target node detected that the migration has completed" is also immediately removing Status.EvacuationNodeName (see: https://github.com/kubevirt/kubevirt/blob/0c17b3287760e90ef012d2536818ea3fbf2c5069/pkg/virt-handler/migration-target.go#L200-L214  ) and so a new migration as a result of the eviction request is never executed and the node drain is stuck.
      
      

      Version-Release number of selected component (if applicable):

      kubevirt-hyperconverged-operator.4.20.0-220

      How reproducible:

      ??? encountered only once for a large scale upgrade test with 10k VMIs

      Steps to Reproduce:

      tbd. Still unclear
      

      Actual results:

      An old migration to the node actually succeeded (the VM is really there, the old source pod is gone, the target one is correctly there).
      Something is not correctly aligned (where ?).
      virt-handler is hot-looping detecting the end of the migration and removing status.evacuationNodeName and this block future eviction. As a result the drain of the node is stuck. 

      Expected results:

      virt-handler correctly detects that the old migrations successfully completed. A new one could be triggered and the node can be successfully drained.

      Additional info:

      https://github.com/kubevirt/kubevirt/pull/15721 is a related change around this area. tbd

              ffossemo@redhat.com Federico Fossemo
              stirabos Simone Tiraboschi
              Denys Shchedrivyi Denys Shchedrivyi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: