Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-761

Identify missing metrics required for OVN observability on OpenStack

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • OVN
    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given the need for comprehensive network observability in RHOSO,

      When an engineer reviews the currently available OVN metrics,

      Then, they should identify any missing metrics related to ovn-db-raft, ovn-controller, and ovn-northd that are required to support the monitoring, network policy correlation, and topology context for OVN in OpenStack.

      Show
      Given the need for comprehensive network observability in RHOSO, When an engineer reviews the currently available OVN metrics, Then, they should identify any missing metrics related to ovn-db-raft, ovn-controller, and ovn-northd that are required to support the monitoring, network policy correlation, and topology context for OVN in OpenStack.
    • None
    • rhel-net-ovn
    • ssg_networking
    • OVN FDP Sprint 9
    • 1

      OVN observability is a first step to support network observability for RHOSO. It mainly consists of three parts:

      • Monitoring data collection from OVS/OVN
      • Network policy correlation
      • Topology context for OVN, VM endpoints, service/application

      This ticket is focusing on the data collection part and aims to find the gaps between the required and the available metrics already exposed by OVN.

      Below is the list of the metrics expected to support network observability for RHOSO.

      1. ovn-db-raft metrics
       

      Name Definition
      build_info A metric with a constant '1' value labeled by ovsdb-server version and NB and SB schema version
      db_size The size of the database file associated with the OVN DB component.
      cluster_election_timer A metric that returns the current election timer value labeled by database name, cluster uuid, and server uuid
      cluster_id A metric with a constant '1' value labeled by database name and cluster uuid
      cluster_server_id A metric with a constant '1' value labeled by database name, cluster uuid and server uuid
      cluster_server_role A metric with a constant '1' value labeled by database name, cluster uuid, server uuid and server role
      cluster_server_status A metric with a constant '1' value labeled by database name, cluster uuid, server uuid server status
      cluster_server_vote A metric with a constant '1' value labeled by database name, cluster uuid, server uuid and server vote
      cluster_term A metric that returns the current election term value labeled by database name, cluster uuid, and server uuid
      cluster_leader Identifies whether this pod is a leader for given database
      cluster_inbound_connections_error_total A metric that returns the total number of failed inbound connections to the server labeled by  database name, cluster uuid, and server uuid
      cluster_inbound_connections_total A metric that returns the total number of inbound  connections to the server labeled by database name, cluster uuid, and server uuid
      cluster_log_index_next A metric that returns the log entry index next value labeled by database name, cluster uuid, and server uuid
      cluster_log_index_start A metric that returns the log entry index start value labeled by database name, cluster uuid, and server uuid
      cluster_log_not_applied A metric that returns the number of log entries not applied labeled by database name, cluster uuid, and server uuid
      cluster_log_not_committed A metric that returns the number of log entries not committed labeled by database name, cluster uuid, and server uuid
      cluster_outbound_connections_error_total A metric that returns the total number of failed  outbound connections from the server labeled by database name, cluster uuid, and server uuid
      cluster_outbound_connections_total A metric that returns the total number of outbound connections from the server labeled by database name, cluster uuid, and server uuid
      jsonrpc_server_sessions Active number of JSON RPC Server sessions to the DB
      log_entry_index The index of log entry currently exposed to clients. This value on all the instances of db should be close to each other otherwise they are said to lagging with eaxch other.
      ovsdb_monitors Number of OVSDB Monitors on the server

       

      2. ovn-controller metrics

      Name Definition
      integration_bridge_patch_ports_total Captures the number of patch ports that connect br-int OVS bridge to physical OVS bridge and br-local OVS bridge
      integration_bridge_openflow_total The total number of OpenFlow flows in the integration bridge
      integration_bridge_geneve_ports_total Total number of OVN geneve ports on the node
      lflow_run Number of times ovn-controller has translated the Logical_Flow table in the OVN SB database into OpenFlow flows
      remote_probe_intervala The maximum number of milliseconds of idle time on connection to the OVN SB DB before sending  an  inactivity probe message.
      openflow_probe_intervala The maximum number of milliseconds of idle time on OpenFlow connection to the OVS bridge before sending  an  inactivity probe message.
      monitor_alla Specifies if ovn-controller should monitor  all  records  of  tables  in OVN SB DB. The value of 0 means it will conditionally monitor the records that  is needed in the current chassis.
      encap_ipa A metric with a constant '1' value labeled by ipadress that specifies the encapsulation ip address configured on that node
      sb_connection_methoda A metric with a constant '1' value labeled by sb_connectio_method that specifies the ovn-remote value configured on that node
      encap_typea A metric with a constant '1' value labeled by type that specifies the  encapsulation type that a chassis should use to connect to this node.
      bridge_mappings A metric with a constant '1' value labeled by mapping that specifies a list  of  key-value  pairs that map a physical network name to a local ovs bridge that provides connectivity  to that  network.
      packet_in Specifies the number of times ovn-controller has handled  the packet-ins from ovs-vswitchd.
      packet_in_drop Specifies the number of times the ovn-controller has dropped the packet-ins from ovs-vswitchd due to resource constraints
      rconn_sent Specifies the number of messages that have been sent to the underlying virtual connection (unix, tcp, or ssl) to OpenFlow devices
      rconn_queued Specifies the number of messages that have been queued because it couldn't be sent using the underlying virtual connection to OpenFlow devices
      rconn_discarded Specifies the number of messages that have been dropped because the send queue had to be flushed because of reconnection.
      rconn_overflow Specifies the number of messages that have been dropped because of the queue overflow
      vconn_open Specifies the number of attempts to connect to an OpenFlow Device
      vconn_sent Specifies the number of messages sent to the OpenFlow Device
      vconn_received Specifies the number of messages received from the OpenFlow Device
      stream_open Specifies the number of attempts to connect to a remote peer (active connection)
      txn_success Specifies the number of times the OVSDB transaction has successfully completed
      txn_error Specifies the number of times the OVSDB transaction has errored out
      txn_uncommitted Specifies the number of times the OVSDB transaction were uncommitted
      txn_unchanged Specifies the number of times the OVSDB 
      transaction resulted in no change to the database
      txn_incomplete Specifies the number of times the OVSDB transaction did not complete and the client had to re-try
      txn_aborted Specifies the number of times the OVSDB transaction has been aborted
      txn_try_again Specifies the number of times the OVSDB transaction failed and the client had to re-try
      netlink_sent Number of netlink message sent to the kernel
      netlink_recv Number of netlink messages received by the kernel
      netlink_recv_jumbo Number of netlink messages that were received from the kernel were more than the allocated buffer
      netlink_overflow Netlink messages dropped by the daemon due to buffer overflow

       

      3. ovn-northd metrics

      Name Definition
      status Specifies whether this instance of ovn-northd is standby(0) or active(1) or paused(2)
      probe_interval The maximum number of milliseconds of idle time on connection to the OVN SB and NB DB before sending  an  inactivity probe message.
      pstream_open Specifies the number of time passive connections were opened for the remote peer to connect 
      stream_open Specifies the number of attempts to connect to a remote peer
      txn_success Specifies the number of times the OVSDB transaction has successfully completed
      txn_error Specifies the number of times the OVSDB transaction has errored out
      txn_uncommitted Specifies the number of times the OVSDB transaction were uncommitted
      txn_unchanged Specifies the number of times the OVSDB transaction resulted in no change to the database
      txn_incomplete Specifies the number of times the OVSDB transaction did not complete and the client had to re-try
      txn_aborted Specifies the number of times the OVSDB transaction has been aborted
      txn_try_again Specifies the number of times the OVSDB transaction failed and the client had to re-try

              amusil@redhat.com Ales Musil
              rh-ee-sfaye Stanislas Faye
              Jianlin Shi Jianlin Shi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: