Uploaded image for project: 'Red Hat OpenShift Control Planes'
  1. Red Hat OpenShift Control Planes
  2. CNTRLPLANE-2836

Expose HCP control plane metrics to data plane

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Expose HCP control plane metrics to data plane
    • In Progress
    • Product / Portfolio Work
    • OCPSTRAT-1852[Observability] Improve control plane metric reporting in hosted cluster monitoring stack
    • 67% To Do, 33% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None

      Overview

      This epic implements the Hosted Control Plane Metrics Exposure enhancement (linked to OCPSTRAT-1852).

      Currently, control plane metrics for hosted clusters are invisible to the guest cluster's monitoring stack. This enhancement introduces infrastructure to expose management cluster control plane component metrics to the data plane's Prometheus instance, enabling existing OpenShift Console dashboards and alerting rules to function without modification.

      Problem Statement

      In HyperShift, control plane components (kube-apiserver, etcd, kube-controller-manager, etc.) run as pods on the management cluster. The guest cluster's Prometheus instance cannot scrape these components because they are not reachable from the data plane network. This results in:

      • Empty control plane dashboards in the OpenShift Console
      • Non-functional control plane alerting rules
      • Reduced observability for cluster administrators

      Solution Architecture

      The solution introduces three new components that form a secure metrics pipeline from management cluster control plane pods to the guest cluster's Prometheus:

      Endpoint Resolver (management cluster, internal): Watches control plane pods using label selectors derived from ServiceMonitor/PodMonitor specs. Serves pod IP and name resolution requests over HTTPS. Holds all management cluster RBAC for listing Pods, keeping the externally-exposed metrics proxy free of direct API server access.

      Metrics Proxy (management cluster, exposed via Route): Authenticates requests via TokenReview against the hosted kube-apiserver. Discovers pod endpoints through the endpoint resolver. Fans out parallel scrape requests to individual component pods. Injects standardized labels (job, namespace, service, endpoint, pod) matching standalone OCP conventions. Applies per-component metric filtering based on the configured MetricsSet.

      Guest Cluster Metrics Forwarder (guest cluster): A lightweight HAProxy deployment operating in TCP proxy mode that forwards connections from Prometheus to the metrics proxy route. Requires no certificate management, enabling end-to-end TLS between Prometheus and the metrics proxy.

      Key Design Properties

      _ _Security*: The metrics proxy (exposed via route) holds zero management cluster RBAC. Only the internal endpoint resolver accesses the Pod API. TokenReview validates all requests from the data plane.
      _ _Label Injection*: Labels are injected to match standalone OCP conventions so existing dashboards and alerts work without modification.
      _ _Dynamic Discovery*: Components are discovered dynamically from ServiceMonitors/PodMonitors rather than a hardcoded registry, supporting day-2 operations.
      _ _MetricsSet Filtering*: Per-component allow-lists based on HostedControlPlane MetricsSet configuration (All, SRE, or Telemetry).

      Enhancement Reference

      Enhancement Proposal

              Unassigned Unassigned
              rh-ee-mraee Mulham Raee
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: