Details
-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.7
-
None
-
Important
-
Rejected
-
False
-
Description
Description of problem:
There can be scenarios that cause 1 or more of the 3 DBs to form its own raft cluster. This leads to a very broken state for OVN. Currently ovn-dbchecker does not look for mismatching cluster ids across nodes. We need at least a way to alert that this scenario has happened. Even better, once we detect this situation we need to recover the cluster. The recovery is drastic and involves:
- blowing away all the db files
- restarting all ovnkube master pods
- issuing on all ovn-controllers (or deleting them):
ovn-appctl sb-cluster-state-reset
One possible solution is having something that identifies this scenario and then annotates all of the ovnk pods to signal what they need to do with regard to the above steps.