Background

Application observability

Historically, observability in OpenShift has mostly been focused on platform observability across pillars that also offered some great capabilities for applications. However, the platform was falling short on:

Providing capabilities designed to observe applications as a whole in the UI
Providing a unified UI for all pillars
Unified and easy installation

Since the addition of (just to name a few):

Application Performance Monitoring view, that needs the following to be installed and properly configured:
- OpenTelemetry Collector producing metrics out of traces
- Prometheus to collect those metrics
- Tempo stack
- UI plugin (that nowadays is installed via COO)
Distributed tracing plugin, that needs COO and tenancy for read and write operations, to provide the following features:
- Traces scatter plot
- Traces table
- Gantt chart
- Links between spans and pod logs and metrics
Tempo Monolithic dpeloyment that helps users to install easily an in-memory tracing stack
Highly configurable Prometheus stack part of COO

Application observability in OpenShift has all the components to provide a good experience to users. However, installation of it it's not straight forward, and many users fallback, for application observability, to other options such as third party observability vendors or open source solutions (for example, Grafana + LGTM stack)

Defining and using tenants

Also, after both the security hardening in the Jaeger UI and the addition of the Tracing plugin in the console, it's mandatory to have tenants for both read and write well defined. While this is an improvement in security, it may lead to make it harder for users to:

Start using distributed tracing
Configure environments
Easily debug issues by quicky installing a Tempo instance to troubleshoot ongoing incidents
Install tracing in development environments

Requirements

That's why, as part of this feature, a solution shall be delivered that:

Installs an application observability stack that provides metrics and traces collection, storage and visualization, including
- APM that includes relevant RED metrics
- Distributed Tracing to monitor, troubleshoot and link to other signals the path of a request.
Installs one or multiple tenants via RBAC based on customer needs. The following may be considered
- Optional default tenant installation
- Let the user define "who" is writing and "who" is reading ("who" means which tenant)

Future steps

To be discussed if we can deliver them now
Logs can be part of the pod console output for now (but open to suggestions, or to extend this in future improvements) due to the fact that they always need an object store (no monolithic)
Dashboarding for applications

List any affected packages or components.

Red Hat build of OpenTelemetry
Tempo
Tracing plugin

Assignee:: Jose Gomez-Selles

Reporter:: Jose Gomez-Selles

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/10/21 9:58 AM

Updated:: 2024/11/05 3:57 PM

Details

Description

Background

Application observability

Defining and using tenants

Requirements

Future steps

Attachments

Easy Agile Planning Poker

Activity

People

Dates