-
Bug
-
Resolution: Unresolved
-
Normal
-
ACM 2.10.0, ACM 2.9.0, ACM 2.11.0, ACM 2.12.0
-
False
-
None
-
False
-
-
-
None
Note: Doc team updates the current version of the documentation and the
two previous versions (n-2), but we address *only high-priority, or
customer-reported issues* for -2 releases in support.
Describe the changes in the doc and link to your dev story:
1. - [X] Mandatory: Add the required version to the Fix version/s field.
2. - [X] Mandatory: Choose the type of documentation change or review.
- [X] We need to update to an existing topic
- [ ] We need to add a new document to an existing section
- [ ] We need a whole new section; this is a function not
documented before and doesn't belong in any current section
- [ ] We need an Operator Advisory review and approval
- [ ] We need a z-Stream (Errata) Advisory and Release note
for MCE and/or ACM
3. - [X] *Mandatory: *Use the following link to open the doc and find where the
documentation update should go. Note: As the feature and doc is
understood and developed, this placement decision may change:
- Published doc: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.10
- Source: https://github.com/stolostron/rhacm-docs
Update should go for all versions under Observability documentation, in "Customize observability configuration" > "Adding custom metrics" aka in observability/customize_observability.adoc
4. - [ ] Mandatory for GA content:
- [ ] Add steps, the diff, known issue, and/or other important
conceptual information in the following space:
- [ ] *Add Required access level *(example, *Cluster
Administrator*) for the user to complete the task:
- [ ] Add verification at the end of the task, how does the user
verify success (a command to run or a result to see?)
- [ ] Add link to dev story here:
5. - [X] Mandatory for bugs: What is the diff? Clearly define what the
problem is, what the change is, and link to the current documentation. Only
use this for a documentation bug.
There are several problems with the documentation. In agreement with engineering, here is my proposition for entirely replacing the chapter. I tried to keep the format to match what was in observability/customize_observability.adoc while using the data I published in the article https://access.redhat.com/solutions/7099641
adding-custom-metrics
== Adding custom metrics
To monitor metrics from a remote cluster using RHACM, you first need to know if the metric is being exported as a `platform` or a `user workload` metric. This should be documented for the solution you want to monitor or be stomething support for that product should be able to tell you.
+
If information on how to monitor your solution is not available, you can identify the type of metric by looking at the console of the cluster, under `Observe > Metrics` the `prometheus` column should show what it originates from ; `user workload` metrics are identified as `openshift-user-workload-monitoring` while `platform` metrics would be listed as what provides them.
You may also look at the `ServiceMonitor` for the observed resource and see which annotation it uses :
- `operator.prometheus.io/controller-id: openshift-user-workload-monitoring/prometheus-operator` means this is `user workload`
- `operator.prometheus.io/controller-id: openshift-platform-monitoring/prometheus-operator` means this is `platform`
+
After you know what type of metric you need to setup RHACM to monitor, follow the steps from the appropriate documentation.
—
adding-platform-metrics
=== Adding Platform metrics
Platform metrics can be monitored by creating a `ConfigMap` on the hub cluster in the `open-cluster-management-observability` namespace named `observability-metrics-custom-allowlist`. It needs to be formed as in this example:
+
[source,yaml]
kind: ConfigMap
apiVersion: v1
metadata:
name: observability-metrics-custom-allowlist
namespace: open-cluster-management-observability
data:
metrics_list.yaml: |
names: <1>
- node_memory_MemTotal_bytes
rules: <2> - record: apiserver_request_duration_seconds:histogram_quantile_90
expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket {job=\"apiserver\", verb!=\"WATCH\"}[5m])) by (verb,le))
----
+
<1> Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
<2> Optional: Enter only one value for the `expr` and `record` parameter pair to define the query expression. The metrics are collected as the name that is defined in the `record` parameter from your managed cluster. The metric value returned are the results after you run the query expression.
+
You can use either one or both of the sections.
+
This will apply to every cluster with monitor enabled. If you want to specifically use this configuration for only one cluster, you can instead use a similar configuration directly on the spoke cluster in the same namespace the `endpoint-observability-operator` is deployed, `open-cluster-management-addon-monitoring` :
+
[source,yaml]
----
kind: ConfigMap
apiVersion: v1
metadata:
name: observability-metrics-custom-allowlist
namespace: open-cluster-management-addon-observability
data:
metrics_list.yaml: |
names: <1>
- node_memory_MemTotal_bytes
rules: <2>
- record: apiserver_request_duration_seconds:histogram_quantile_90
expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver", verb!="WATCH"}[5m])) by (verb,le))
+
<1> Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
<2> Optional: Enter only one value for the `expr` and `record` parameter pair to define the query expression. The metrics are collected as the name that is defined in the `record` parameter from your managed cluster. The metric value returned are the results after you run the query expression.
+
You can use either one or both of the sections.
adding-user-workload-metrics
=== Adding user workload metrics
For this type of metric, configuration is performed by a different collector. You need to set configuration on the spoke cluster itself in the `namespace where the metric has to be captured`. It needs to be named `observability-metrics-custom-allowlist` and can be formated as follows :
+
[source,yaml]
kind: ConfigMap
apiVersion: v1
metadata:
name: observability-metrics-custom-allowlist
namespace: monitored_namespace <1>
data:
uwl_metrics_list.yaml: <2>
names: <3>
- sample_metrics
+
<1> Enter the namespace where the metric is captured from?
<2> Enter the key for the config map data.
<3> Enter the value of the config map data in YAML format. The `names` section includes the list of metric names, which you want to collect from the `test` namespace. After you create the config map, the observability collector collects and pushes the metrics from the target namespace to the hub cluster.
+
This example monitors the user workload metric `sample_metrics` from the namespace `monitored_namespace`. If this configuration is instead created in the `open-cluster-management-addon-monitoring` namespace, it would be collected from *all* the namespaces of the spoke cluster.