-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
Goal
A correlation service that queries multiple signal stores to return correlated signal data.
- Define a "correlation context" that contains information used to find correlated signals.
A correlation context can be derived from:- An alert instance
- Cluster resources
- Design an API that accepts a "correlation context" and returns a set of queries for correlated signals.
- Define the representation of a correlation context
- Define representations used to return queries.
- For each use case identified by
LOG-1811,- write down the relevant correlation context(s)
- design the corresponding correlation queries to retrieve correlated items for each signal store
- automate the transformation from context to query
- Test the queries on some sample cluster to validate the relevance of signals they return.
Assume the following signal stores:
- logs (Loki, LogQL)
- metrics (Prometheus, PromQL)
- traces (tempo)
- k8s events (k8s Event API)
Non-Goal
At this point the goal is to collect and review correlated data, not to decide how to present it in a UI.
Acceptance Criteria
- Working POC that produces useful queries from the the alerts listed in
LOG-1811 - Well defined format for input contexts and returned queries
- Verification that the results are useful on at least one real cluster
The POC is a starting point to prove that we can automate correlation for a few initial cases.
It may be implemented as a stand-alone executable or as a service.
It sould be written in expectation that it will be extended with more rules.