XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Correlation
Labels:
None

Epic Name:
Correlation Engine
Blocked:
False
Ready:
False
Docs QE Status:
NEW
Epic Status:
To Do
Flagged:

Impediment
QE Status:
NEW
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Type:
Release Note Not Required

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Goals

A cluster service with a REST API that provides correlated data to clients that need correlated data (console components or CLI tools)

Accepts correlation context as query
Applies correlation rules to form queries to multiple back-ends; signal stores and cluster APIs.
Returns queries to caller, caller executes.

Correlation API

The API must:

Define a correlation context, including:
- time window
- related resources
- alert and metric types
- desired result type(s) - signals, resources
- should be extensible to handle additional context data types
Define result format(s).
- Results must use streams and/or iterators for high-volume results (logs, traces)
- See open questions on result format.

Meta-data dictionaries

A uniform data model may be a long term goal, but currently we have (at least) these distinct data models:

Open Telemetry - used by tracing, candidate for eventual unified model.
Viaq] - used by logging.
Openshift/K8s/Prometheus - used by alerts/metrics - no formal spec I'm aware of but there are strong conventions used to name metrics and metric labels labels in k8s and openshift, based on the Prometheus style.

We need dictionaries to translate between these data models so we can

formulate native queries for each back-end.
normalize results returned from each back-end.

Correlation Engine Implementation

The engine should start small but be able to grow by adding:

New back-ends (initially Loki, Prometheus, Jaeger, K8s events and resources)
New correlation rules; ways to handle specific types of context more accurately.
New data model dictionaries.

Extensions may be static code, plug-ins, declarative data or some combination.

Whatever the form, the engine must have clear extension points.

Non-Goals

See ~~OBSDA-110~~

Motivation

See ~~OBSDA-110~~

Alternatives

Rely on 3rd party tools to develop to meet our customers correlation needs.

Acceptance Criteria

This Epic is complete when we have a correlation engine as described above that is sufficient to be released as GA.

The engine will be built so that we can have multiple "checkpoint" releases, either internally or for customer preview.

We will refine our idea ofwhat "GA ready" means by experimenting and getting feedback on those checkpoints.

Risk and Assumptions

Risk: Ambitious, risks getting bogged down.

Remedy: start small and grow

Small minimum feature set for initial preview, grow incrementally.
GA when preview feedback indicates we have "enough".
Keep growing in following releases.

Risk: Correlation results are not statifactory.

Remedy: Early investigation suggests we can produce valuable correlations.
No way to be sure till we try it.

Risk: Can't compete/keep up with 3rd party tools on advanced features.

Remedy: See ~~LOG-1779~~ Value Proposition

Documentation Considerations

Documentation of (or self-documenting) console components for console-based use.

Documentation of query language, result formats etc. for CLI-based use if we decide to support that.

Open Questions

Should results be:

normalized to a single consistent format and data model? (Open Telemetry?)
returned in native form for each signal type?
allow both based on user preference:
- New users likely to prefer consistent output.
- Existing users have their own tools/queries based on existing data models.

Additional Notes

is incorporated by

OBSDA-110 Correlation of observability signals

Closed

Assignee:: Alan Conway

Reporter:: Alan Conway

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/01/13 8:33 PM

Updated:: 2024/02/07 2:16 PM

Resolved:: 2024/02/05 7:47 PM

Details

Description

Goals

Correlation API

Meta-data dictionaries

Correlation Engine Implementation

Non-Goals

Motivation

Alternatives

Acceptance Criteria

Risk and Assumptions

Documentation Considerations

Open Questions

Additional Notes

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates