Uploaded image for project: 'OpenShift Hosted Control Plane'
  1. OpenShift Hosted Control Plane
  2. HOSTEDCP-401

Ensure relevant HyperShift metrics are sent to telemetry


    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • Hypershift Sprint 10, Hypershift Sprint 11, Hypershift Sprint 12, Hypershift Sprint 13, Hypershift Sprint 15, Hypershift Sprint 16, Hypershift Sprint 17, Hypershift Sprint 18, Hypershift Sprint 19, Hypershift Sprint 20, Hypershift Sprint 21
    • 0
    • 0
    • 0

      There are a number of questions that we want answered about HyperShift running standalone or managed:

      Source: https://docs.google.com/document/d/1RdFLy1PWVDRGLsPSMbhTFNY3u9tccy11vk_YZDegdBQ/edit?usp=sharing

      Adoption Metrics

      Mainly we want to understand who is adopting us, and by how much?

      • How many installs per customer (i.e., how many mgmt clusters)?
      • How many hosted clusters per install (i.e., per management cluster)?
      • Which Infrastructure?
      • How many Hosted Clusters?
      • How many HyperShiftDeployments?
      • How many Add-ons?
      • How many nodePools?
      • How many Autoscaling?
      • How many AutoRepair?
      • HC configs?

      Usage Metrics

      Mainly trying to answer “How we are being used, and where?”, more specific questions we should try to answer: 

      • Do customers create hosted clusters directly or use MCE’s HCD (day 2)?
      • What infrastructures are in use by HyperShift? 
      • Do we have a scale limit of some sort? How do we know if we reach that limit?
      • From creation to cluster, where do customers struggle the most? 

      Impact Metrics, Examples

      HyperShift is doing greater good to the business. How do we measure that?  How much cost savings are we introducing? How does this reduction impact Total Cost of Ownership? Examples of questions to think about: 

      • How much does operating 1 Hosted Control Plane (HCP) cost? 
      • Reduce footprint/control-plane
      • Memory/CPU/Storage/Network per CP, how far can we get down?

      Health Metrics

      This is really about validating whether HyperShift works as expected 

      • Percentage of Healthy cluster installs?
      • Percentage of NodePool failures?
      • Leading causes for failures?
      • API errors, why? 
      • Throughput or Data loss?

      We need to ensure that the HyperShift operator is producing the metrics that will answer these questions and that the metrics are being sent/consumed by telemetry.

            cewong@redhat.com Cesar Wong
            cewong@redhat.com Cesar Wong
            Jie Zhao Jie Zhao
            0 Vote for this issue
            3 Start watching this issue