Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-117250

A well-timed network failure can resume both migration domains

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-9.6
    • None
    • No
    • None
    • rhel-virt-core-libvirt-1
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • x86_64
    • None

      What were you trying to do that didn't work?

      A pre-copy migration where the target stops connectivity when the migration is over. The trigger is the last event the target receives: (event=resumed,detail=migrated).

      There is actually one last message going from the target to the source after that event. If that message doesn't make it to the source, the source will resume its domain thinking the migration has failed.

      What is the impact of this issue to you?

      libvirt should make sure that it's impossible to have a situation where both the source and target get resumed.

      Please provide the package NVR for which the bug is seen:

      libvirt-daemon-common-10.10.0-7.6.el9_6.x86_64

      How reproducible is this bug?:

      In "normal" libvirt scenarios, this bug needs a perfectly timed network failure, which is very rare.

      In OpenShift Virtualization, where the migration is proxied, and where the migration target closes the proxy after the last event, we get a repro every few hundred migrations.

      Steps to reproduce

      1. Start a migration with a process on target that listens to events and breaks connectivity (through a firewall/proxy/...) when receiving the last event (event=resumed,detail=migrated).
      2. With enough tries, you will see the migration target resume because of migration success, and the source resume because of migration failure.

      Expected results

      Either migration success or failure, not a mix of both...

      In OpenShift Virtualization, we need the migration target to know when a migration is over and can't rely on the source.

      Actual results

      VM running in both source and target.

        1. source.log
          1.69 MB
        2. target.log
          1.59 MB

              jdenemar@redhat.com Jiri Denemark
              jelejosne Jed Lejosne
              Jiri Denemark Jiri Denemark
              Liping Cheng Liping Cheng
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: