Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1432

Stabilise chassis-redirect fail-over and minimise packet loss when BFD flaps

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • None
    • OVN
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given an HA router with a chassis-redirect port replicated on two gateways,

      When the BFD session between them drops long enough for OVN to migrate the CR port and later comes back up,

      Then, OVN must delay any re-claim until the configured grace period expires and make every hypervisor forward new flows to the gateway that currently owns the port without losing tenant traffic.

      Show
      Given an HA router with a chassis-redirect port replicated on two gateways, When the BFD session between them drops long enough for OVN to migrate the CR port and later comes back up, Then, OVN must delay any re-claim until the configured grace period expires and make every hypervisor forward new flows to the gateway that currently owns the port without losing tenant traffic.
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking

      When the BFD link between two gateway nodes briefly drops, OVN’s “chassis-redirect” port may bounce back and forth:

      1. GW-2 claims the CR port while GW-1 is declared “down”; a few hundred ms later BFD recovers, GW-1 immediately grabs the port back, repeating the cycle and generating clusters of ARP/GARP bursts.
      1. While the gateways fight, compute hosts (hypervisors) can keep sending tenant traffic to the old gateway, causing short but visible outages.

      The goal of this ticket is to eliminates the rapid ping-pong when links flap and guarantees HVs steer traffic to the current owner within one control-loop cycle.

              ovnteam@redhat.com OVN Team
              rh-ee-sfaye Stanislas Faye
              Jianlin Shi Jianlin Shi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: