Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-361

Need ability to detect raft split brain with OVN DBs and reconcile

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.7
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      There can be scenarios that cause 1 or more of the 3 DBs to form its own raft cluster. This leads to a very broken state for OVN. Currently ovn-dbchecker does not look for mismatching cluster ids across nodes. We need at least a way to alert that this scenario has happened. Even better, once we detect this situation we need to recover the cluster. The recovery is drastic and involves:

      1. blowing away all the db files
      2. restarting all ovnkube master pods
      3. issuing on all ovn-controllers (or deleting them): 
        ovn-appctl sb-cluster-state-reset
         

      One possible solution is having something that identifies this scenario and then annotates all of the ovnk pods to signal what they need to do with regard to the above steps.

              mkennell@redhat.com Martin Kennelly
              trozet@redhat.com Tim Rozet
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: