Loading...

XML

Word

Printable

Type: Epic
Resolution: Won't Do
Priority: Major
Fix Version/s: 2021-08-01
Affects Version/s: None
Component/s: None
Labels:
- status:committed
- strategy:research

Epic Name:
Analytics MVP
Blocked:
False
Ready:
False
Epic Status:
Done
Release Note Text:
Undefined

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Market:

It enhances the Distributed Tracing experience on OpenShift and benefits our customers with a better understanding of the traces.

Relevance and Importance

Today, the usage of Jaeger is limited to trace collection and presenting the traditional traceview diagram. Arguably, this might be perceived as little value for what some people might think of as a big investment. Incrementally building analytics capabilities based on early feedback might help Red Hat steer the distributed tracing efforts to unleash the full potential of traces for helping SREs.

A. MVP Strategy

In order to enable a quick feedback loop with users, customers, and/or internal teams to whom we show the analytics capabilities, it would be desirable to rely on an existing interactive interface, such as Jupyter Notebooks. Following this approach, we would implement the analytics supporting our initial key use cases as functions in a Notebook. These functions would serve as entry points for the key use cases we will support. Critically, they would process traces stored on the Jaeger backend. This way, the demos will involve both the actual Jaeger UI and the Notebook with analytics.

When demonstrating it to the target audience, we can tell users that the function invocations they are seeing would correspond to a UI flow yet to be designed. Similarly, each invoked function should produce useful visualizations on the Notebook. Again, as the capabilities are demonstrated, we can point out that the produced visualizations embedded in the Notebook could be incorporated into a UI view or views yet to be designed.

This approach would be agile. Not only will it enable us to get early feedback on the perceived value of the analytics capabilities, but it will also encourage user feedback on the UI views prior to or during their design.

B. Initial Use Cases

Below is a list of initial use cases we can aim for.

B.1 Service-centric troubleshooting

Given a time interval and the name of a service that is problematic, show to the user:

- Representative trace aggregates, including histograms for latency and error rates for each span, and counts of retries/timeouts.
- The critical path (longest span) in each representative aggregate.
- The tag that more strongly correlates with the critical path (and the corresponding correlation coefficient), which should help "explain" the critical path. Finding which tag correlates with the critical path might help identify, for instance, that a slowdown occurs when a particular service version is invoked or when the call is coming from a particular server.
The histogram of calls made by the chosen service to its downstream services, highlighting how frequently each downstream service is called and statistics on durations and errors.

B.2 Trace aggregation (group-by operations)

Given a time interval and a group-by parameter, show to the user:

Trace aggregates, including histograms for latency and error rates for each span, and counts of retries/timeouts.

We can choose an initial aggregation group-by operation to start and incrementally aggregations based on a variety of parameters.

B.3 Trace aggregation diff

Given two aggregates produced as in B.2, highlight key differences between them. The differences might include significant changes in (1) latency distributions, (2) error rates, and (3) graph shape.

B. Timeline for MVP

Following the agile methodology, as a team we should choose one use case to start with so that we can have an end-to-end implementation via Notebook. As we get feedback on the first use case, we move on to the next. As we receive feedback, work on actual UI design might be desirable.

We can plan to have at least the first Notebook-based use case (chosen as a team) by August 1st, 2021. Use cases implemented after that should move relatively quickly, capitalizing on all initial efforts to both generate test traces and put together a foundational development and testbed environment. Note that B.1(1)--(4) can be thought of as individual use cases.

How does this align with the strategy ?

It focus on enhance the capabilities of the backend
This can be a good attempt to provide "smart" capabilities into Distributed Tracing and collect feedback from our customers.
Analytics capabilities will be incredibly important for a potential Hosted Distributed Tracing service and a major differential capability.

Assignee:: Ed Snible (Inactive)

Reporter:: Juraci Paixão Kröhling (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2021/05/11 1:23 PM

Updated:: 2024/02/06 9:07 AM

Resolved:: 2021/09/15 1:46 PM

Details

Description

Relevance and Importance

A. MVP Strategy

B. Initial Use Cases

B.1 Service-centric troubleshooting

B. Timeline for MVP

Attachments

Easy Agile Planning Poker

Activity

People

Dates