Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-69382

[network] Interface hot-plug fails due to migration failure

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • CNV v4.20.0
    • CNV Virt-Node
    • None
    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Important
    • None

      Description of problem:

      On hot-plug tests, many times the action fails, and debugging shows that the migration part fails, because the source virt-launcher pod remains in Running state.
      
      This actually looks like a migration issue, but we observe it in hot-plug tests. I would guess that this is related to the fact the starting 4.20, migration can only be done by cluster admin (and this is how we test it in tier-2), while hot-plug is done by namespace admin (i.e. non-privileged user), and the required migration part of it is executed in the background, by granting the required privileges in the background.

      Version-Release number of selected component (if applicable):

      CNV v4.20.0.rhel9-146

      How reproducible:

      Not always

      Steps to Reproduce:

      1.
      Perform interface hot-plug in a VM.
      For QE - this can be done by running one of ther tier-2 hot-plug tests.
      
      

      Actual results:

      In some results - the action is not completed because the migration gets stuck

      Expected results:

      Migration succeeds and NIC hot-plug completed.

      Additional info:

      We see that when this happens, we have 2 pods in running state - the new target virt-luancher and the old source virt-launcher.
      In the virt-controller log, we see a message explicitly saying that the target pod cannot be scheduled because there are multiple pods in running state, e.g.:
      
      2025-09-06T22:16:57.636421642Z {"component":"virt-controller","kind":"","level":"info","msg":"Waiting to schedule target pod for migration because there are already multiple pods running for vmi l2-bridge-test-bridge-nic-hot-plug/hot-plug-test-vm-1757196614-051137","name":"kubevirt-workload-update-gj7mm","namespace":"l2-bridge-test-bridge-nic-hot-plug","pos":"migration.go:1192","timestamp":"2025-09-06T22:16:57.636345Z","uid":"c908e54d-bcf9-43d0-be38-f20c9a7b01e3"}
      
      See attached logs.
      
      * I submitted this as a network bug, but there's a good chance that this is a virt issue, so my apologies in advance.

       

        1. new-virt-launcer-v8jdn.yaml
          38 kB
        2. old-virt-launcer-kdxlf.yaml
          38 kB
        3. virt-controller.log
          92 kB
        4. old-virt-launcer-kdxlf.log
          3.48 MB
        5. new-virt-launcer-v8jdn.log
          4.05 MB

              sgott@redhat.com Stuart Gott
              ysegev@redhat.com Yossi Segev
              Denys Shchedrivyi Denys Shchedrivyi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: