-
Feature Request
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
This is a followup RFE of OCPBUGS-59414.
1. Proposed title of this feature request
Observability for control plane health
2. What is the nature and description of the request?
ClusterOperators in OCP classic provide health metrics and statuses for aggregated components. This is not the case in HCP, where some control plane components are deployed directly by the control plane operator. For some components, existing ClusterOperators and CVO metrics are currently a red herring as they are mocked by the hosted cluster config operator. They do not properly reflect the health of control plane components. Other components don't have a cluster operator, but there is also no metrics (and status) presenting the health of specific cluster operations.
I found the following list of components/cluster operations previously existing as cluster operators for which the health is now not easy to track, as it would require re-creating custom observability logic:
- openshift-apiserver (cluster operator present, but health mocked)
- openshift-controller-manager (cluster operator present, but health mocked)
- kube-apiserver (cluster operator present, but health mocked)
- kube-controller-manager (cluster operator present, but health mocked)
- kube-scheduler (cluster operator present, but health mocked)
- Operator-lifecycle-manager-packageserver (cluster operator present, but health mocked)
- authentication
- cloud-controller-manager
- cloud-credential
- cluster-autoscaler
- etcd
- machine-approver
- storage
- marketplace
Additionally, we would like to easily monitor the health of new components added with HCP via metrics and statuses, e.g. but not only:
- CAPI (previously present as machine-api cluster operator for classic) / ignition / etc.
- hosted-cluster-config-operator
- control-plane-operator
This RFE requests a replacement for ClusterOperators for HCP, which should provide the health and - if existing - degradation cause of components to service providers running the control plane, as well as a degradation cause if applicable.
For components that are running on both the control and data plane (e.g. cloud-controler-manager), the service provider should be able to distinguish between the state of the control and data plane. Ideally, it should be clear whether the degradation is caused by a misconfiguration / modification on the HostedCluster user's data plane or cloud environment or on the service provider's side.
3. Why does the customer need this? (List the business requirements here)
General observability of HCP.
4. List any affected packages or components.
- relates to
-
OCPBUGS-59414 No CVO metrics for etcd
-
- New
-