Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-656

Incorporate a clear monitoring story with self-managed Hosted Control Planes

XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 0% To Do, 0% In Progress, 100% Done
    • 0

      Overview

      This feature aims to provide a monitoring story for customers of a self-managed Hosted Control Plane (ACM/MCE with HCP) by reusing the pluggable dashboard console feature in the OCP console as the MVP in case ACM is not in use. This feature will allow for enhanced observability and improved user experience. An example of how such a dashboard can be configured is below:

      kind: ConfigMap
      metadata: 
        labels: 
          console.openshift.io/dashboard: "true"
        name: basic-hcp-dashboard
        namespace: hypershift
      data: ...
      
      

       Key Considerations

      • Dashboard creation is to be initiated when the customer opts in for all metrics (not just telemetry). By default, not all metrics are exported to avoid overloading the monitoring stack. 
      • The dashboard will track key Service Level Indicators (SLIs) and Service Level Objectives (SLOs) like API availability, API server error rates, usage for the rest of the control plane and in the future latency between the control plane and workers. We will start with the top three easiest metrics to implement.
      • We aim to provide a pragmatic, if not aesthetically perfect, user experience from a monitoring standpoint without muddling our ACM messaging. The Northstar here is the ACM observability stack as a sustainable comprehensive monitoring solution.
      • Dashboard configuration is per HCP, with each HCP living in its own OpenShift project (namespace). This is compatible with the tenancy model of User Workload Monitoring (UWM).

      Open Discussion / Long-term Concerns 

      The usage of UWM for HCP metrics on the management cluster has a few drawbacks:

      • Configuration via ConfigMap being more error-prone and less GitOps friendl
      • Fewer configuration knobs than with Out of the Box with the Observability Operator (ObO), and the slower delivery model bound to the OCP release cadence. 

      These issues would be resolved with using ObO, which is currently being productized.

      Acceptance Criteria

      1. Introduction of custom dashboards via the OCP console dashboard plugin feature. 
      2. The dashboard provides monitoring and tracking for the agreed-upon SLIs/SLOs.
      3. The dashboard configuration is per HCP, aligning with the tenancy model of UWM.
      4. Successful communication and cooperation with the rest of the team to ensure no details are missed, and the right story is communicated to the customer in our documentation

              asegurap1@redhat.com Antoni Segura Puimedon
              azaalouk Adel Zaalouk
              Cesar Wong, Daniel Mohr, Derek Carr, Eric Paris, Roger Florén
              Laura Hinson Laura Hinson
              Cesar Wong Cesar Wong
              Adel Zaalouk Adel Zaalouk
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: