-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
None
libovsdb doesn't provide any db constraints besides db indexes, therefore we may end up with incorrect data in the db, e.g. 2 pods with the same ip.
We have already faced multiple bugs related to inconsistent data in the db. To be more proactive, and catch such cases before they cause problems, we can add external to the db data consistency checks, e.g. https://github.com/ovn-org/ovn-kubernetes/pull/2760/files#diff-b61468c0028d28eb0e1060d24f53f28b6482acfbfd0cf5c15f9c23c3fd8c9f91R530
Another example is stale SNATs cleanup https://github.com/openshift/network-tools/pull/69, and equivalent ACLs cleanup like https://issues.redhat.com/browse/OCPBUGS-772
Since every db inconsistency should mean a bug in our code, we want to not just cleanup the db, but signal about the problem. That can be done e.g. via metrics or events.
When we find a new bug that wasn't caught before, customer's db may be in a bad state, therefore we need to have a way to cleanup the db while we are working on the fix. We have some scripts like that in network tools now, but we can think of some other way to do so, that will allow to automatically fix db inconsistencies we already know of, and ship new fixes for all cluster versions.