Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-3058

Phase 0: Services Dashboard - Onboarding/Readiness

XMLWordPrintable

      Overview

      The goal of this Jira is to get a current snapshot of the metrics your service has available today for the Services Dashboard (FAQ). Here you’ll find two important sections:

      • Definition of Done
      • Clarity
        • Data Specs
        • Data Source Governance

      Additional metrics may be added to the dashboard in the future, but you'll be engaged in a separate Jira for any new metrics we aim to collect.

      If you’re having trouble answering the questions here, please reach out to aminter@redhat.com

      Definition of Done

      Here's the metrics we're interested in:

      Metric How this metric is used
      SLOs
      A binary measurement. A service meets SLO or does not meet SLO. If there is no SLO defined or measured, then that also counts as a Not meeting SLO. You’re eligible for either a score of 1 (Not meeting SLO) or a score of 4 (Meeting SLO).
      Incidents
      An index of the number of incidents within the chosen time period in the dashboard. 
      Lead Time to Change
      Lead Time in Days from when a change is submitted to when it moves into your production pipeline
      Mean time to Restore
      Restoration time in Hours and Minutes, taken from reported Incident Data
      Deployment Frequency
      Days that have deployments during a given time period.
      Change Failure Rate
      Number of incidents
      /
      number of deployments

      There are two different options we can take to call the work complete. Here are the options:

      1. Provide a data source where data can be found to support each metric. 
        • The more clearly you can state “it will be located in this database, on X table” the better. We fundamentally understand that this won’t always be transactional, with the potential for one or more conversations about your data source. 
        • Ensure there is at least one person available to explain the data within the data source to our team - document that person here if someone other than the assignee. Additionally, this person may be called upon to assist our engineers with collecting the data so that it can be presented on the Services Dashboard.
      2. No data present in any data source to support the requested metric.
        • If you don’t have the data, and won’t be able to collect and record it within a short-time frame (Ex. the next 5 days), simply state that in the Jira for each metric.
        • This Jira is not looking to make “strikes” against teams that don’t have data, we just need to see where teams stand today.
        • If you have existing Stories or Epics that relate to work around gathering the requested metric, definitely link those here.

      Clarity

      Data Specs

      Any one Data Spec below should only be treated as general guidance for what we’d like to see in your data source(s). At this time, The Services Dashboard does not have a standardized data spec for each of our metrics. This may change in the future, and will be reflected in future revisions of the Hybrid SRE process/templates.  

       

      SLOs

      Field Type Notes
      SLO Target Numeric Ex. 0.999 or 99.9%
      SLO Achieved Numeric Ex. 0.999 or 99.9%
      Timebox for SLO (in days) Numeric The timeframe you’re measuring your SLO within.
      Ex. 7, 14, 30

       

      Incidents

      Field Type Notes
      Incident ID STRING Some indicator or ID number that represents the incident record within your data source
      Cause of Incident STRING A reference to a commit, Jira, or other change that may represent part of the root cause
      Incident Start TIMESTAMP A timestamp indicating when the incident started
      Incident Close TIMESTAMP A timestamp indicating when the incident is official closed

       

      Lead Time to Change

      Field Type Notes
      Commit ID STRING A unique hash or key that represents a specific commit to your service repo
      Commit Timestamp TIMESTAMP A timestamp of a given commit
      Merge Request ID STRING A unique hash or key that represents a specific MR within your Service Repo
      Merge Request Open Timestamp TIMESTAMP The date and time a Merge Request was opened
      Merge Request Closed Timestamp TIMESTAMP The date and time a Merge Request was closed (merged)
      Commits IDs within Merge Request Array of STRING A list of the commit IDs attached to the Merge Request

       

      Mean Time to Restore

      Field Type Notes
      Incident ID STRING Some indicator or ID number that represents the incident record within your data source
      Incident Start TIMESTAMP A timestamp indicating when the incident started
      Incident Close TIMESTAMP A timestamp indicating when the incident is official closed

       

      Deployment Frequency

      Field Type Notes
      Merge Request ID STRING A unique hash or key that represents a specific MR within your Service Repo
      Merge Request Open Timestamp TIMESTAMP The date and time a Merge Request was opened
      Merge Request Closed Timestamp TIMESTAMP The date and time a Merge Request was closed (merged)
      Commits IDs within Merge Request Array of STRING A list of the commit IDs attached to the Merge Request

       

      Change Failure Rate

      Field Type Notes
      Incident ID STRING Some indicator or ID number that represents the incident record within your data source
      Cause of Incident (Commit, Merge Request) STRING A commit/MR ID attached to an incident caused by change in code
      Merge Request ID STRING A unique hash or key that represents a specific MR within your Service Repo
      Merge Request Open Timestamp TIMESTAMP The date and time a Merge Request was opened
      Merge Request Closed Timestamp TIMESTAMP The date and time a Merge Request was closed (merged)
      Commits IDs within Merge Request Array of STRING A list of the commit IDs attached to the Merge Request

      Data Source Governance

      The governance and quality of your data is an important factors to this work. Today, our goal is to only measure the Governance of the data source you’re storing your data within. To measure Data Governance, we’ll pass the source you provide through a scorecard that can be found at this link.

       

      The governance score of your data source will be shown on the Services Dashboard alongside your service’s data. The goal is to promote transparency with the leadership using the dashboard, but also to raise awareness of the ways we can holistically improve the governance of where Managed Services store this data today.

       

      To read more on our approach, please check out the following blog post in our Source Space.

              Unassigned Unassigned
              mmazur@redhat.com Mariusz Mazur
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: