Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-8033

Dataplane SLO dashboard should use minimum to aggregate over KI

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • dashboards
    • None
    • MK - Sprint 218

      WHAT

      The dataplane SLO dashboard is currently using the arithmetric mean to aggregate the availability over all Kafka instances.

      Update the dashboard to fix the skewed behaviour issue to https://issues.redhat.com/browse/MGDSTRM-7999

      WHY

      The arithmetic mean is meaningless. Using it means that the Error Budget Policy (EBP) won't kick in even though some number (>0, possibly > 1) of instances have run out of error budget. That can have knock-on effects for other services that depend on RHOSAK. E.g. if one of those Kafka instances was being used for SmartEvents then the EBP not kicking in would mean RHOSAK could carry on reducing Error Budget still further unaware that SmartEvents' SLO was now being broken.

      HOW

      To fix this we need to change the query to use the minimum of the error budget over all the instances. I.e. We report the worst error budget over all the instance over the last 28 days.

      This will ensure that the Error Budget Policy kicks in once any instance has run out of Error Budget.

      Move the individual kafka panel to be the first graph you see out of the two

      DONE

      Include the following where applicable:

      • <bulleted list of functional acceptance criteria that need to be completed>
      • <call out anything on the documentation side that's needed as a result of this task being completed>
      • <any metrics, monitoring dashboards and alerts that need to be created or be updated>
      • <SOP creation or updates>

      Guidelines

      The following steps should be adhered to:

      • Required tests should be put in place - unit, integration, manual test cases (if necessary)
      • CI and all relevant tests passing
      • Changes have been verified by one additional reviewer against:
      • each required environment
      • each supported upgrade path
      • If the changes could have an impact on the clients (either UI or CLI), a JIRA should be created for making the required changes on the client side and acknowledged by one of the client side team members. PR has been merged
         

            rlawton@redhat.com Rachel Lawton
            tbentley-1 Tom Bentley
            MK - Running the Service
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: