-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Product / Portfolio Work
-
False
-
-
False
-
None
-
None
-
None
-
None
Summary
Implement the metrics-proxy component that runs as a deployment in the HCP namespace, exposed via an OpenShift Route with TLS passthrough. This component authenticates requests from the guest cluster's Prometheus, fans out scrape requests to control plane pods, and returns merged metrics with injected labels.
Scope
_ _Deployment & Route*: Create the metrics-proxy Deployment managed by the CPO as a new ControlPlaneComponent. Expose it via an OpenShift Route with TLS passthrough.
_ _Authentication*: Implement bearer token authentication using TokenReview against the hosted kube-apiserver. Cache successful validation results with a 5-minute TTL. Return 401/403 for unauthorized requests.
_ _Endpoint Resolution*: Integrate with the endpoint-resolver component to discover pod endpoints for each control plane component. Use HTTPS client to query the endpoint resolver's API.
_ _Metrics Scraping*: Fan out parallel scrape requests to individual component pods discovered via the endpoint resolver. Handle per-pod scrape failures gracefully (return metrics from responsive pods, don't block entire requests).
_ Label Injection: Inject standardized labels (job, namespace, service, endpoint, pod) matching standalone OCP conventions for each component. Read label injection values from hypershift.openshift.io/metrics-_ annotations on ServiceMonitor/PodMonitor resources.
_ _MetricsSet Filtering*: Apply per-component metric allow-lists based on the HostedControlPlane's MetricsSet configuration (All, SRE, or Telemetry).
_ _Configuration Generation*: The CPO should dynamically generate proxy configuration by enumerating ServiceMonitors and PodMonitors in the HCP namespace, extracting upstream connection details from spec fields.
_ _TLS*: Serve HTTPS using certificates that match the Route hostname for SNI routing. Generate and manage the metrics-proxy serving certificate and CA bundle.
_ _Resilience*: Implement readiness probes that detect kube-apiserver connectivity issues. Return HTTP 503 with retry headers when the service is unavailable.
Design Notes
- The metrics proxy holds zero management cluster RBAC (security boundary). All pod discovery is delegated to the endpoint resolver.
- TokenReview caching reduces API load while handling token expiry correctly.
- Label injection ensures existing OpenShift Console dashboards and alerting rules function without modification.
- Must support dynamic addition/removal of components via day-2 operations.
Acceptance Criteria
- metrics-proxy Deployment is created and managed by the CPO
- Route with TLS passthrough exposes the metrics proxy externally
- TokenReview authentication works correctly against the hosted kube-apiserver
- Parallel scraping of control plane pods with graceful failure handling
- Label injection produces metrics matching standalone OCP conventions
- MetricsSet filtering correctly applies per-component allow-lists
- Unit tests cover authentication, scraping, label injection, and filtering logic
Dependencies
- Requires the endpoint-resolver component (story 1) to be implemented first