-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
1
-
False
-
None
-
False
-
Need add manual test
-
SDN Sprint 223, SDN Sprint 224
-
0
-
0.000
OVN northbound and southbound Databases both have RAFT clusters. See here for more details: https://web.stanford.edu/~ouster/cgi-bin/cs190-winter21/lecture.php?topic=raft
This story is mainly focused around creating alerts based on metrics ovn_db_cluster_inbound_connections_error_total / ovn_db_cluster_inbound_connections_total.
ovn_db_cluster_inbound_connections_total gauge
- Description
- A metric that returns the total number of inbound connections to the server labeled by database name, cluster uuid, and server uuid. The inbound connections are only from other RAFT cluster members.
- Normal/Expected values
- The total count of RAFT cluster members minus one. So, if you have a 3 member RAFT cluster, expect 2 inbound connections.
ovn_db_cluster_inbound_connections_error_total gauge
- Description
- A metric that returns the total number of failed inbound connections to the server labeled by database name, cluster uuid, and server uuid. The inbound connection errors are only generated from other RAFT cluster members and not from the CMS or northd or ovn-controller.
- Normal/Expected values
- 0
PR that includes the commit for this change: https://github.com/openshift/cluster-network-operator/pull/1526
See commit titled: OVN-K alerts: add ovn db connection alerts
See commit code for for the 8 alerts.