-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
1
-
False
-
None
-
False
-
Need add manual test
-
SDN Sprint 223, SDN Sprint 224
-
0
-
0.000
OVN northbound and southbound Databases both have RAFT clusters. See here for more details: https://web.stanford.edu/~ouster/cgi-bin/cs190-winter21/lecture.php?topic=raft
This story is to create alerts based on metric ovn_db_cluster_server_status.
ovn_db_cluster_server_status gauge
- Description
- A metric with a constant '1' value labeled by database name, cluster uuid, server uuid server status. The label ‘server_status’ which represents the RAFT status of the db, can be: ‘joining cluster’, ‘leaving cluster’, ‘left cluster’, ‘failed’, ‘disconnected from the cluster (election timeout)’ or ‘cluster member’.
- Normal/Expected values
- For each database, (nb or sb), for the vast majority of time, there is an entry with label ‘server_status’ is ‘cluster member’
PR that includes the commit for this change: https://github.com/openshift/cluster-network-operator/pull/1526
See commit titled: OVN-K alerts: add ovn db cluster member error
See code for two alerts which will fire if there is a cluster member error.