-
Epic
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
Reduce return to service time following abnormal broker shutdown
-
False
-
None
-
False
-
No
-
To Do
-
MGDSRVS-48 - Be able to sustain an external paying customer in production
-
0% To Do, 0% In Progress, 100% Done
-
---
-
---
WHAT
If a kafka broker abnormally shuts down for any reason (for instance, node or storage failure), there is a chance that the broker may need to go into a log recovery state on next startup in order to repair the log file. Whilst it is in this state, the instance will be in a degraded state or even offline (depends on the number of brokers of the instance that need recovery).
Recovery can be a time consuming process, especially for kafka broker with large amounts of data.
RHOSAK is using kafka's default configuration is to use a single thread. To reduce the return to service time the number of threads should be increased.
WHY
Reduce time taken to return an instance to full service.
HOW
Investigate the best number of recovery threads and verify that improvement that will be made in recover time. See the spike task.
Update the service to use the chosen number of threads.
- relates to
-
MGDSTRM-9154 Expose log recovery metrics on support dashboards
- Backlog