-
Epic
-
Resolution: Obsolete
-
Critical
-
None
-
None
-
Expose partition size metric
-
False
-
None
-
False
-
Yes
-
To Do
-
MGDSRVS-170 - Improve Monitoring, Metrics and Observability capabilities to support Production Workloads
-
25% To Do, 0% In Progress, 75% Done
-
---
-
---
This is a continuation of https://issues.redhat.com/browse/MGDSTRM-8663 and https://issues.redhat.com/browse/MGDSTRM-8488 but to enable customers to do self-service in identifying and addressing imbalanced clusters.
WHAT
As a customer, I may choose a topic/partition strategy that causes storage of the brokers to be used unevenly. One pathological case is a full topic with a single partition with the default replication factor 3. This will cause a 3 out of the 6 brokers to be full, with the remaining 3 broker empty. The customer won't be able to publish any more messages, even to topics that reside on the other (non-full) brokers. Cruise control won't help as moving the partitions about won't help. The problem is really in the customer's domain: poor partitioning strategy.
WHY
Customers who won't understand why they can't utilise all their storage.
HOW
Expose metrics and document the process to follow in determining which partitions are close filling up which brokers. This can be a combination of new metrics exposed and CLI operations (not necessarity RHOAS CLI, but can be also kafka admin scripts)
DONE
Include the following where applicable:
- A new partition storage metric
- Metric documented here
- KB describing the problem and the process to follow for identifying and rectifying it.