Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- telemetry

Activity Type:
Product / Portfolio Work
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Telemetry for Standalone HyperShift
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
Hypershift Sprint 10, Hypershift Sprint 11, Hypershift Sprint 12, Hypershift Sprint 13, Hypershift Sprint 15, Hypershift Sprint 16, Hypershift Sprint 17, Hypershift Sprint 18, Hypershift Sprint 19, Hypershift Sprint 20, Hypershift Sprint 21

Risk Score:
0

There are a number of questions that we want answered about HyperShift running standalone or managed:

Source: https://docs.google.com/document/d/1RdFLy1PWVDRGLsPSMbhTFNY3u9tccy11vk_YZDegdBQ/edit?usp=sharing

Adoption Metrics

Mainly we want to understand who is adopting us, and by how much?

How many installs per customer (i.e., how many mgmt clusters)?
How many hosted clusters per install (i.e., per management cluster)?
Which Infrastructure?
How many Hosted Clusters?
How many HyperShiftDeployments?
How many Add-ons?
How many nodePools?
How many Autoscaling?
How many AutoRepair?
HC configs?

Usage Metrics

Mainly trying to answer “How we are being used, and where?”, more specific questions we should try to answer:

Do customers create hosted clusters directly or use MCE’s HCD (day 2)?
What infrastructures are in use by HyperShift?
Do we have a scale limit of some sort? How do we know if we reach that limit?
From creation to cluster, where do customers struggle the most?

Impact Metrics, Examples

HyperShift is doing greater good to the business. How do we measure that? How much cost savings are we introducing? How does this reduction impact Total Cost of Ownership? Examples of questions to think about:

How much does operating 1 Hosted Control Plane (HCP) cost?
Reduce footprint/control-plane
Memory/CPU/Storage/Network per CP, how far can we get down?
…

Health Metrics

This is really about validating whether HyperShift works as expected

Percentage of Healthy cluster installs?
Percentage of NodePool failures?
Leading causes for failures?
API errors, why?
Throughput or Data loss?

We need to ensure that the HyperShift operator is producing the metrics that will answer these questions and that the metrics are being sent/consumed by telemetry.

links to

NodePools and HC metrics

Assignee:: Cesar Wong

Reporter:: Cesar Wong

QA Contact:: Jie Zhao

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/04/20 8:54 PM

Updated:: 2025/06/27 9:47 AM

Resolved:: 2022/12/13 5:57 PM

Details

Description

Adoption Metrics

Usage Metrics

Impact Metrics, Examples

Health Metrics

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates