Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-8922

Expose BrokerState value from broker

XMLWordPrintable

    • MK - Sprint 221

      What

      BrokerState metric should be exposed to Prometheus and added to a dashboard so that the support team can understand the state of the broker. 

      Why

      The broker state reveals the current internal state of the broker.  This important to understand the state of the service.   This is critical information for the SRE when trying to diagnose problems with the service.

       

      • The state the broker is in when it first starts up NOT_RUNNING((byte) 0)
      • The state the broker is in when it is catching up with cluster metadata. STARTING((byte) 1)
      • The broker has caught up with cluster metadata, but has not yet been unfenced by the controller. RECOVERY((byte) 2)
      • The state the broker is in when it has registered at least once, and is  accepting client requests.   RUNNING((byte) 3)
      • The state the broker is in when it is attempting to perform a controlled  shutdown.   PENDING_CONTROLLED_SHUTDOWN((byte) 6)
      •  The state the broker is in when it is shutting down.  SHUTTING_DOWN((byte) 7),
      • The broker is in an unknown state. UNKNOWN((byte) 127)

      BrokerState is currently exposed via the labels on the metric, as these are far more readable, however that means we can't use them to track the time spent in each state. Exposing it as the value will allow us to get a view on how long kafka brokers spend in each state.

      How

      • Currently BrokerState is exposed via the labels with a fixed value. Expose it as the value of the metric as well.

      Done

      • Metric exposed recorded in prometheus as a value as well as labels

       

       

              sbarker@redhat.com Sam Barker
              keithbwall Keith Wall
              Kafka Integrations
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: