Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-662 Multiple cluster/leave calls can result in a leaderless cluster after a downed member returns
  3. FDP-1358

[RHEL-10 OVS-3.5] Multiple cluster/leave calls can result in a leaderless cluster after a downed member returns

    • Icon: Sub-task Sub-task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • openvswitch3.5
    • None
    • 0
    • False
    • Hide

      None

      Show
      None
    • False
    • openvswitch3.5-3.5.0-14.el10fdp
    • rhel-10
    • rhel-net-ovs-dpdk
    • ssg_networking
    • OVS/DPDK - FDP-25.D
    • 1

      When following the steps in a 3-server ovn-sandbox nbdb cluster:

      1. Start an ovn sanbox with NB servers in a 3-server raft cluster
      2. Kill the nb1 server without leaving the cluster
      3. Have nb3 leave the cluster
      4. Have nb2 leave the cluster
      5. Restart nb1

      the cluster ends up in a broken state. There is no leader. All 3 servers respond to cluster/status requests, including nb3 which had previously successfully left the cluster. nb1 remains a candidate, nb2 and nb3 are followers with unknown leader. All servers show they are communicating with each other.

      The problem seems to stem from members in a "leaving" state not actually participating in votes/sending commands even before their removal has been accepted by the cluster.

      Attached is a reproducer script.

              imaximet@redhat.com Ilya Maximets
              twilson@redhat.com Terry Wilson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: