Currently, we have multiple, different experiences by providing separate tooling for each and every signal. For metrics we have Prometheus UI and Grafana and for logs Kibana. For the former, we have already successfully started to bring the most critical views into the Console and we want to do the same for logs. That will allow us to be much more flexible on how we present data back so that SREs can easily correlate information in the future without looking at different UIs.
Enabling observability is a key aspect of our investment in different areas. One area is to bring all experiences for helping SREs to troubleshoot problems into a single, native experience inside the OpenShift Console.
Some important points to consider:
- A unified experience inside the OpenShift Console makes future investments to improve the discoverability of meaningful data much easier by looking at the complete picture holistically.
- Not using third party UIs will make our entire auth mechanism easier since there is only on access point.
- When we introduce a new, improved storage with
LOG-704, we have to have some nice UI since we can't use Kibana anymore.
- Minimizes deployment size and complexity of the Logging stack.
Enhance the current "Metrics" explorer view by introducing the ability to switch to looking at logs instead of metrics. We should also rename "Metrics" to "Explorer" to reflect that it's now not just metrics but also logs a user could explore.
The expectation is that most of the "Exploration" features such as "export", "visualize", "create alerting rule", and others are capabilities shared across both signals.
Goal & Success
- A unified experience inside the OpenShift Console that makes future investments to improve the discoverability of meaningful data easier just by looking at the complete picture holistically.
- Allow users to query logs directly (grep-like experience comes up pretty often). The engine should only return logs that a users has access to. OpenShift Console currently only supports projects/namespaces as a way to separate between tenants. Here "admins" should have access to all logs as before, but "developers" only look at logs from a single namespace.
- Ability to further filter logs easily by, for example, selecting a specific timeframe.
- Ability to export current result into a TEXT or CSV file.
- Ability to visualize query result into a graph.
Open Questions & Key Decisions (optional)
- is incorporated by
OBSDA-110 Correlation of observability signals
- To Do