Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-72773

Measure, publish and test stuntime of bridge CNI and OVN localnet during live migration

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • CNV v4.22.0
    • None
    • CNV Network
    • None
    • measure-stuntime
    • Quality / Stability / Reliability
    • 77
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      • T2 tests making sure that the downtime does not get above a treshold
      • Downtime was measured in an optimal environment and the results are published in a blog post or a KCS
      • No downstream docs
      • No UXD
      • No feature
      Show
      T2 tests making sure that the downtime does not get above a treshold Downtime was measured in an optimal environment and the results are published in a blog post or a KCS No downstream docs No UXD No feature
    • None
    • Green
    • In Progress
    • 33% To Do, 33% In Progress, 33% Done
    • Hide

      2026-02-23:
      The work has started...

      Show
      2026-02-23: The work has started...

      Goal

      Publish information about the minimal measured stuntime with our most common secondary network CNIs. This is not to give a guarantee, but a rough estimation, and a way to compare with other virtualization platforms.

      User Stories

      • As a user, I want OpenShift Virtualization to publish expected, so I can set my expectations and compare it with results published by other virtualization solutions.
      • As an OpenShift Virtualization QE, I want stuntime measurement to be automated, to have an easy way to get it on demand.
      • As an OpenShift Virtualization QE, I want to have stuntime test coverage, to make sure that new product code does not break live migration.

      Non-Requirements

      • It is not required to provide a formula for the stuntime, or any guarantee for the worst case scenario.
      • It is not expected to run performance tests in our regression tests. The new tests should only detect when the stuntime gets really bad - suggesting a bug in our ARP handling.

      Notes

      • We should have tests measuring the stuntime during live migration, there should be a reasonable threshold on them, to make sure we don't regress too far (but not too tight, since we cannot guarantee how fast will be the environment)
      • We should perform a test of the stuntime, seeing what is the lowest stuntime we can get to. We should pick bare metal cluster and migrate between workers that are close to each other. This should be done with both bridge CNI and OVN localnet. We should measure the stuntime from migrated to static VM and the other way around. And we should measure it when the VM is migrating away, and when it is migrating back
      • Scenarios:
        • Localnet and bridge CNI
        • From and to the migrated VM
        • Starting together and migrating away, starting away and migrating to each other, migrating between two other nodes
        • IPv6 vs IPv4

              rh-ee-awax Anat Wax
              phoracek@redhat.com Petr Horacek
              Yoss Segev Yoss Segev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: