-
Task
-
Resolution: Obsolete
-
Normal
-
None
-
None
-
None
Overview
The goal of this Jira is to get a current snapshot of the metrics your service has available today for the Services Dashboard (FAQ). Here you’ll find two important sections:
- Definition of Done
- Clarity
- Data Specs
- Data Source Governance
Additional metrics may be added to the dashboard in the future, but you'll be engaged in a separate Jira for any new metrics we aim to collect.
If you’re having trouble answering the questions here, please reach out to aminter@redhat.com.
Definition of Done
Here's the metrics we're interested in:
Metric | How this metric is used |
---|---|
SLOs |
A binary measurement. A service meets SLO or does not meet SLO. If there is no SLO defined or measured, then that also counts as a Not meeting SLO. You’re eligible for either a score of 1 (Not meeting SLO) or a score of 4 (Meeting SLO).
|
Incidents | An index of the number of incidents within the chosen time period in the dashboard. |
Lead Time to Change | Lead Time in Days from when a change is submitted to when it moves into your production pipeline |
Mean time to Restore | Restoration time in Hours and Minutes, taken from reported Incident Data |
Deployment Frequency | Days that have deployments during a given time period. |
Change Failure Rate |
Number of incidents
/
number of deployments
|
There are two different options we can take to call the work complete. Here are the options:
- Provide a data source where data can be found to support each metric.
- The more clearly you can state “it will be located in this database, on X table” the better. We fundamentally understand that this won’t always be transactional, with the potential for one or more conversations about your data source.
- Ensure there is at least one person available to explain the data within the data source to our team - document that person here if someone other than the assignee. Additionally, this person may be called upon to assist our engineers with collecting the data so that it can be presented on the Services Dashboard.
- No data present in any data source to support the requested metric.
- If you don’t have the data, and won’t be able to collect and record it within a short-time frame (Ex. the next 5 days), simply state that in the Jira for each metric.
- This Jira is not looking to make “strikes” against teams that don’t have data, we just need to see where teams stand today.
- If you have existing Stories or Epics that relate to work around gathering the requested metric, definitely link those here.
Clarity
Data Specs
Any one Data Spec below should only be treated as general guidance for what we’d like to see in your data source(s). At this time, The Services Dashboard does not have a standardized data spec for each of our metrics. This may change in the future, and will be reflected in future revisions of the Hybrid SRE process/templates.
SLOs
Field | Type | Notes |
SLO Target | Numeric | Ex. 0.999 or 99.9% |
SLO Achieved | Numeric | Ex. 0.999 or 99.9% |
Timebox for SLO (in days) | Numeric | The timeframe you’re measuring your SLO within. Ex. 7, 14, 30 |
Incidents
Field | Type | Notes |
Incident ID | STRING | Some indicator or ID number that represents the incident record within your data source |
Cause of Incident | STRING | A reference to a commit, Jira, or other change that may represent part of the root cause |
Incident Start | TIMESTAMP | A timestamp indicating when the incident started |
Incident Close | TIMESTAMP | A timestamp indicating when the incident is official closed |
Lead Time to Change
Field | Type | Notes |
Commit ID | STRING | A unique hash or key that represents a specific commit to your service repo |
Commit Timestamp | TIMESTAMP | A timestamp of a given commit |
Merge Request ID | STRING | A unique hash or key that represents a specific MR within your Service Repo |
Merge Request Open Timestamp | TIMESTAMP | The date and time a Merge Request was opened |
Merge Request Closed Timestamp | TIMESTAMP | The date and time a Merge Request was closed (merged) |
Commits IDs within Merge Request | Array of STRING | A list of the commit IDs attached to the Merge Request |
Mean Time to Restore
Field | Type | Notes |
---|---|---|
Incident ID | STRING | Some indicator or ID number that represents the incident record within your data source |
Incident Start | TIMESTAMP | A timestamp indicating when the incident started |
Incident Close | TIMESTAMP | A timestamp indicating when the incident is official closed |
Deployment Frequency
Field | Type | Notes |
Merge Request ID | STRING | A unique hash or key that represents a specific MR within your Service Repo |
Merge Request Open Timestamp | TIMESTAMP | The date and time a Merge Request was opened |
Merge Request Closed Timestamp | TIMESTAMP | The date and time a Merge Request was closed (merged) |
Commits IDs within Merge Request | Array of STRING | A list of the commit IDs attached to the Merge Request |
Change Failure Rate
Field | Type | Notes |
Incident ID | STRING | Some indicator or ID number that represents the incident record within your data source |
Cause of Incident (Commit, Merge Request) | STRING | A commit/MR ID attached to an incident caused by change in code |
Merge Request ID | STRING | A unique hash or key that represents a specific MR within your Service Repo |
Merge Request Open Timestamp | TIMESTAMP | The date and time a Merge Request was opened |
Merge Request Closed Timestamp | TIMESTAMP | The date and time a Merge Request was closed (merged) |
Commits IDs within Merge Request | Array of STRING | A list of the commit IDs attached to the Merge Request |
Data Source Governance
The governance and quality of your data is an important factors to this work. Today, our goal is to only measure the Governance of the data source you’re storing your data within. To measure Data Governance, we’ll pass the source you provide through a scorecard that can be found at this link.
The governance score of your data source will be shown on the Services Dashboard alongside your service’s data. The goal is to promote transparency with the leadership using the dashboard, but also to raise awareness of the ways we can holistically improve the governance of where Managed Services store this data today.
To read more on our approach, please check out the following blog post in our Source Space.