Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-361

Need ability to detect raft split brain with OVN DBs and reconcile

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Undefined
    • None
    • 4.7
    • None
    • Important
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:
      There can be scenarios that cause 1 or more of the 3 DBs to form its own raft cluster. This leads to a very broken state for OVN. Currently ovn-dbchecker does not look for mismatching cluster ids across nodes. We need at least a way to alert that this scenario has happened. Even better, once we detect this situation we need to recover the cluster. The recovery is drastic and involves:

      1. blowing away all the db files
      2. restarting all ovnkube master pods
      3. issuing on all ovn-controllers (or deleting them): 
        ovn-appctl sb-cluster-state-reset
         

      One possible solution is having something that identifies this scenario and then annotates all of the ovnk pods to signal what they need to do with regard to the above steps.

      Attachments

        Activity

          People

            mkennell@redhat.com Martin Kennelly
            trozet@redhat.com Tim Rozet
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: