Resolution: Unresolved
ACM 2.10.0, ACM 2.9.0, ACM 2.11.0, ACM 2.12.0
Note: Doc team updates the current version of the documentation and the
two previous versions (n-2), but we address *only high-priority, or
customer-reported issues* for -2 releases in support.
Describe the changes in the doc and link to your dev story:
1. - [X] Mandatory: Add the required version to the Fix version/s field.
2. - [X] Mandatory: Choose the type of documentation change or review.
- [X] We need to update to an existing topic
- [ ] We need to add a new document to an existing section
- [ ] We need a whole new section; this is a function not
documented before and doesn't belong in any current section
- [ ] We need an Operator Advisory review and approval
- [ ] We need a z-Stream (Errata) Advisory and Release note
for MCE and/or ACM
3. - [X] *Mandatory: *Use the following link to open the doc and find where the
documentation update should go. Note: As the feature and doc is
understood and developed, this placement decision may change:
- Published doc: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.10
- Source: https://github.com/stolostron/rhacm-docs
Update should go for all versions under Observability documentation, in "Customize observability configuration" > "Adding custom metrics" aka in observability/customize_observability.adoc
4. - [ ] Mandatory for GA content:
- [ ] Add steps, the diff, known issue, and/or other important
conceptual information in the following space:
- [ ] *Add Required access level *(example, *Cluster
Administrator*) for the user to complete the task:
- [ ] Add verification at the end of the task, how does the user
verify success (a command to run or a result to see?)
- [ ] Add link to dev story here:
5. - [X] Mandatory for bugs: What is the diff? Clearly define what the
problem is, what the change is, and link to the current documentation. Only
use this for a documentation bug.
There are several problems with the documentation. In agreement with engineering, here is my proposition for entirely replacing the chapter. I tried to keep the format to match what was in observability/customize_observability.adoc while using the data I published in the article https://access.redhat.com/solutions/7099641
== Adding custom metrics
To monitor metrics from a remote cluster using RHACM, you first need to know if the metric is being exported as a `platform` or a `user workload` metric. This should be documented for the solution you want to monitor or be stomething support for that product should be able to tell you.
If information on how to monitor your solution is not available, you can identify the type of metric by looking at the console of the cluster, under `Observe > Metrics` the `prometheus` column should show what it originates from ; `user workload` metrics are identified as `openshift-user-workload-monitoring` while `platform` metrics would be listed as what provides them.
You may also look at the `ServiceMonitor` for the observed resource and see which annotation it uses :
- `operator.prometheus.io/controller-id: openshift-user-workload-monitoring/prometheus-operator` means this is `user workload`
- `operator.prometheus.io/controller-id: openshift-platform-monitoring/prometheus-operator` means this is `platform`
After you know what type of metric you need to setup RHACM to monitor, follow the steps from the appropriate documentation.
=== Adding Platform metrics
Platform metrics can be monitored by creating a `ConfigMap` on the hub cluster in the `open-cluster-management-observability` namespace named `observability-metrics-custom-allowlist`. It needs to be formed as in this example:
kind: ConfigMap
apiVersion: v1
name: observability-metrics-custom-allowlist
namespace: open-cluster-management-observability
metrics_list.yaml: |
names: <1>
- node_memory_MemTotal_bytes
rules: <2> - record: apiserver_request_duration_seconds:histogram_quantile_90
expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket {job=\"apiserver\", verb!=\"WATCH\"}[5m])) by (verb,le))
<1> Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
<2> Optional: Enter only one value for the `expr` and `record` parameter pair to define the query expression. The metrics are collected as the name that is defined in the `record` parameter from your managed cluster. The metric value returned are the results after you run the query expression.
You can use either one or both of the sections.
This will apply to every cluster with monitor enabled. If you want to specifically use this configuration for only one cluster, you can instead use a similar configuration directly on the spoke cluster in the same namespace the `endpoint-observability-operator` is deployed, `open-cluster-management-addon-monitoring` :
kind: ConfigMap
apiVersion: v1
name: observability-metrics-custom-allowlist
namespace: open-cluster-management-addon-observability
metrics_list.yaml: |
names: <1>
- node_memory_MemTotal_bytes
rules: <2>
- record: apiserver_request_duration_seconds:histogram_quantile_90
expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver", verb!="WATCH"}[5m])) by (verb,le))
<1> Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
<2> Optional: Enter only one value for the `expr` and `record` parameter pair to define the query expression. The metrics are collected as the name that is defined in the `record` parameter from your managed cluster. The metric value returned are the results after you run the query expression.
You can use either one or both of the sections.
=== Adding user workload metrics
For this type of metric, configuration is performed by a different collector. You need to set configuration on the spoke cluster itself in the `namespace where the metric has to be captured`. It needs to be named `observability-metrics-custom-allowlist` and can be formated as follows :
kind: ConfigMap
apiVersion: v1
name: observability-metrics-custom-allowlist
namespace: monitored_namespace <1>
uwl_metrics_list.yaml: <2>
names: <3>
- sample_metrics
<1> Enter the namespace where the metric is captured from?
<2> Enter the key for the config map data.
<3> Enter the value of the config map data in YAML format. The `names` section includes the list of metric names, which you want to collect from the `test` namespace. After you create the config map, the observability collector collects and pushes the metrics from the target namespace to the hub cluster.
This example monitors the user workload metric `sample_metrics` from the namespace `monitored_namespace`. If this configuration is instead created in the `open-cluster-management-addon-monitoring` namespace, it would be collected from *all* the namespaces of the spoke cluster.