Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-8039

Create alert for severe disk skew

XMLWordPrintable

    • MK - Sprint 218

      What

      We want an alert that will fire if the disk usage on some broker is much higher than the others.  This will be the trigger for the SREs to run Cruise Control so that the imbalance is remediated.

      The alert will be written to that it detects the condition in the 3 broker case too, as there are some corner cases there that could lead to that condition (modifications to RF).  We accept the fact that the Cruise Control MVP won't remediate these problems, but it is still preferable to know the condition exists.

      How

      (KW) I was wonder if we could take a statistical approach, perhaps using ideas from https://prometheus.io/blog/2015/06/18/practical-anomaly-detection/.  disk usage on "broker > x% and disk usage > n standard deviations above the mean disk usage for the whole cluster".

      Done

       

            kstanley@redhat.com Kate Stanley
            keithbwall Keith Wall
            Kafka Integrations
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: