-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Kafka Roller 2.0
-
False
-
None
-
False
-
To Do
-
80% To Do, 20% In Progress, 0% Done
-
-
Background
The existing KafkaRoller has been suffering from the following shortcomings:
- It doesn’t understand log recovery
- Misinterprets it as a problem and restarts again.
- It doesn’t worry about partition leadership
- Should let broker resume preferred leadership after rolling
- Its availability check KafkaAvailability is too resource-intensive
- All topic descriptions in memory at once
- It’s difficult to reason about
- For KRaft we may need logic for process.role=controller and process.roles=broker,controller
- Large clusters: duration of rolling restarts becomes a problem
- Tension with Cruise Control e.g. over leadership
Kafka Roller 2.0
A new KafkaRoller design has been proposed and prototyped by tbentley-1 with the following aspects
Enhanced Broker Status Reflection
- An observation abstraction is introduced that allows collecting various information about the broker from different sources such as Kubernetes (Pod status etc), Kafka Admin API (ISR state etc) and metrics endpoints.
- Exposing dedicated Kafka Brokers metrics endpoint that would be more capable of understanding and exposing the broker state (e.g. in_log_recovery )
- The rich observations enables a sophisticated classification of the broker status which more accurately reflects its state.
Repeatable, Predictable Rolling
- The set of broker states form a state machine which reflects the available states and possible transitions.
- The roller would be responsible of continuously collecting observations, classifying the broker state, and reconciling brokers in unhealthy states via the defined state machine transitions.
- A processor abstraction is introduced for processing state transitions.
How
- Create a Strimzi proposal upstream
- Work on the existing prototype to reach a functional PoC
- The PoC introduced clear interfaces between observe, classify and process which allows to work on parallel in a test driven approach for validation.
- Exposing dedicated broker metrics via a broker side component (Java agent, metric reporter, etc)
- Developing an alpha version of the Kafka Roller 2.0 in Strimzi that sits behind a feature flag that would be driven to a GA feature.
- Thorough test coverage
- Leverage property testing for the correctness validation
- Defining and implementing a full set of test cases which can start with the golden path tests to define the testing framework and structure
- Continue defining different test cases
- relates to
-
ENTMQST-4126 Various improvements to the KafkaRoller
- New
- links to