Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-662

Multiple cluster/leave calls can result in a leaderless cluster after a downed member returns

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • OVS FDP 24.C
    • openvswitch3.3
    • None
    • 13
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given a multi-node OVN RAFT cluster with nodes explicitly leaving the cluster or experiencing failures,

      When nodes previously removed or marked as leaving rejoin or recover unexpectedly,

      Then the OVN RAFT cluster correctly elects a new leader promptly without entering a permanently leaderless state.

      Show
      Given a multi-node OVN RAFT cluster with nodes explicitly leaving the cluster or experiencing failures, When nodes previously removed or marked as leaving rejoin or recover unexpectedly, Then the OVN RAFT cluster correctly elects a new leader promptly without entering a permanently leaderless state.
    • rhel-9
    • None
    • rhel-net-ovs-dpdk
    • ssg_networking
    • OVS/DPDK - FDP-25.D
    • 1
    • Important

      When following the steps in a 3-server ovn-sandbox nbdb cluster:

      1. Start an ovn sanbox with NB servers in a 3-server raft cluster
      2. Kill the nb1 server without leaving the cluster
      3. Have nb3 leave the cluster
      4. Have nb2 leave the cluster
      5. Restart nb1

      the cluster ends up in a broken state. There is no leader. All 3 servers respond to cluster/status requests, including nb3 which had previously successfully left the cluster. nb1 remains a candidate, nb2 and nb3 are followers with unknown leader. All servers show they are communicating with each other.

      The problem seems to stem from members in a "leaving" state not actually participating in votes/sending commands even before their removal has been accepted by the cluster.

      Attached is a reproducer script.

              imaximet@redhat.com Ilya Maximets
              twilson@redhat.com Terry Wilson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: