Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- kafka-integrations-apac-refinement-done
- kafka-integrations-europe-refinement-done

Epic Name:
Expose Broker State
Blocked:
False
Blocked Reason:
None
Ready:
False
Discussed with Team:
No
Epic Status:
In Progress
Feature Link:
MGDSRVS-48 - Be able to sustain an external paying customer in production
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
[QE] How to address?:
---
[QE] Why QE missed?:
---

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

What

BrokerState metric should be exposed to Prometheus and added to a dashboard so that the support team can understand the state of the broker.

Why

The broker state reveals the current internal state of the broker. This important to understand the state of the service. This is critical information for the SRE when trying to diagnose problems with the service.

The state the broker is in when it first starts up NOT_RUNNING((byte) 0)
The state the broker is in when it is catching up with cluster metadata. STARTING((byte) 1)
The broker has caught up with cluster metadata, but has not yet been unfenced by the controller. RECOVERY((byte) 2)
The state the broker is in when it has registered at least once, and is accepting client requests. RUNNING((byte) 3)
The state the broker is in when it is attempting to perform a controlled shutdown. PENDING_CONTROLLED_SHUTDOWN((byte) 6)
The state the broker is in when it is shutting down. SHUTTING_DOWN((byte) 7),
The broker is in an unknown state. UNKNOWN((byte) 127)

How

Expose the Kafka JMX mbean to Prometheus: https://github.com/bf2fc6cc711aee1a0c2a/kas-fleetshard/blob/main/operator/src/main/resources/kafka-metrics.yaml
Have the metric remote written to Central Observatorim https://github.com/bf2fc6cc711aee1a0c2a/observability-resources-mk/blob/main/resources/prometheus/remote-write.yaml
Expose the metrics on the dashboard. Include sufficient context on the dashboard so that SRE can understand what the state means.
Once MGDSTRM-8173 is complete that SOP should consider talking about this metric to help the SRE understand the state of the service.

Done

Metric expose on the dashboard

mentioned on

Merge request - MGDSTRM-8773: Expose BrokerState metric to CEE dashboard

Merge request - MGDSTRM-8773: release tasks

Assignee:: Sam Barker

Reporter:: Keith Wall

Team:: Kafka Integrations

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/06/07 2:42 PM

Updated:: 2022/08/05 12:38 PM

Resolved:: 2022/08/05 12:38 PM

Details

Description

What

Why

How

Done

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates