-
Outcome
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
100% To Do, 0% In Progress, 0% Done
-
False
-
Outcome Overview
Upon the completion of all features and initiatives, hosted cluster administrators and Cluster Service Providers will gain enhanced visibility into control plane and data plane metrics, enabling proactive monitoring, troubleshooting, and capacity planning. This improvement directly aligns with the company’s strategic goal of ensuring stability, reliability, and user empowerment within HCP environments by providing comprehensive and actionable insights into cluster health and connectivity.
Key results include:
- Propagation of control plane metrics (e.g., csv_succeeded) to hosted cluster monitoring stacks.
- Real-time detection and reporting of connectivity issues between control and data planes.
- Enhanced monitoring capabilities across different cluster configurations, including clusters with zero workers.
- New visibility into scheduler and controller manager metrics previously unavailable.
Success Criteria
For this strategic outcome to be considered delivered, the following must be true:
- Control plane metrics, such as csv_succeeded, are queryable within the hosted cluster’s monitoring stack and telemetry tools.
- Connectivity metrics between control planes and data planes are accurately exposed and monitored through Konnectivity and data plane components.
- Critical metrics, including scheduler and controller manager metrics, are propagated reliably regardless of cluster size or worker node presence.
- Administrators can utilize existing tools (e.g., OpenShift Console, ACM) to query and monitor these metrics effectively.
- Documentation is updated to guide users on metric consumption, interpretation, and alert configuration.
Expected Results (what, how, when)
- Improved Visibility:
- What: Hosted cluster administrators gain visibility into critical control plane and connectivity metrics.
- How: Metrics such as csv_succeeded, scheduler metrics, and connectivity status are propagated to hosted clusters and queryable through telemetry and dashboards.
- Proactive Issue Detection:
- What: Reduction in downtime caused by undetected connectivity issues during HCP upgrades or configuration changes.
- How: New metrics exposed in Konnectivity pods provide real-time insights into connectivity health
- Enhanced Reporting and Capacity Planning:
- What: Comprehensive operator health and connectivity data for ACM dashboards.
- How: Integration of propagated metrics, including scheduler and controller manager data, into ACM and telemetry reporting workflows.
- depends on
-
OBSDA-451 Incorporate a clear monitoring story with self-managed Hosted Control Planes
- To Do