-
Epic
-
Resolution: Done
-
Undefined
-
None
-
None
-
Cluster troubleshooting journey
-
4
-
False
-
-
False
-
Not Selected
-
NEW
-
To Do
-
QE Needed, Docs Needed, TE Needed, Customer Facing, PX Needed
-
NEW
-
0% To Do, 0% In Progress, 100% Done
-
4
Description
As an OpenShift Observability user, I want to make use of Observability data and correlated signals via the OpenShift UI to ease my day-to-day debug journey.
As part of this journey, we will explore introducing an Observability Overview page. This could serve as a central location for Observability. SREs could glean cluster health status here and see critical alerts. This page could also serve as the starting point for a cluster troubling shooting journey.
We also want to leverage Korrel8r (signal correlation) technology for this effort.
Note from Alan Conway (Architect, Correlation of Signals Initiative)
The traditional observable signals in a cluster include:
- Logs (text records emitted by containers, structured or unstructured records)
- Metrics (numeric values collected periodically)
- Alerts (structured records indicating an important transition in metric values)
- Traces (coming soon, tree-structured records of function calls or network requests, traceable across multiple containers)
We also consider these to be signals:
- K8s Events - effectively these are structure logs stored as API objects instead of log file records.
- Network Events (coming soon, records of network-level events)'
Other aspects to this effort could be:
- Alert grouping
- Anomaly detection
- Incidents
Goals & Outcomes
Product Requirements:
Map out a cluster/s troubleshooting journey for SREs/Platform Engineers/ from start to finish.
Engineering/Data Analytics Requirements:
[List here]
Success KPIs
[If applicable]
Documentation
Open Questions
Admin vs Developer Perspective?