Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-9154

Expose log recovery metrics on support dashboards


    • Expose Log Recovery Metrics
    • True
    • Awaiting Kafka 3.3.0.
    • False
    • No
    • To Do
    • MGDSRVS-48 - Be able to sustain an external paying customer in production
    • ---
    • ---


      KIP-831 exposes the log recovery metrics, which is helpful for support to monitor the log recovery progress since it might take hours to complete.   The service should expose these metrics to the support dashboard so support users can better understand the state of a kafka instance.


      Log recovery is a process when a broker start up, if it has previous unclean shutdown, it'll be triggered to make sure the log is in a good state and not get corrupted. If the broker stores a lot of logs, the log recovery process might take hours or days for the log recovery completion. So far, we don't have any way to know how far away from completion. So this metrics will help the support team know about the progress of log recovery.


      1. Expose the Kafka JMX mbean to Prometheus: https://github.com/bf2fc6cc711aee1a0c2a/kas-fleetshard/blob/main/operator/src/main/resources/kafka-metrics.yaml
      2. Have the metric remote written to Central Observatorim https://github.com/bf2fc6cc711aee1a0c2a/observability-resources-mk/blob/main/resources/prometheus/remote-write.yaml
      3. Expose the metrics on the dashboard. Include sufficient context on the dashboard so that SRE can understand what the state means.


      • Metrics exposed to support dashboard.


            Unassigned Unassigned
            lukchen@redhat.com Luke Chen
            Kafka Integrations
            0 Vote for this issue
            3 Start watching this issue
