Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-16419

Upgrade the custom metric documentation to include better details

XMLWordPrintable

    • False
    • None
    • False
    • None

      Note: Doc team updates the current version of the documentation and the
      two previous versions (n-2), but we address *only high-priority, or
      customer-reported issues* for -2 releases in support.
      Describe the changes in the doc and link to your dev story:

      1. - [X] Mandatory: Add the required version to the Fix version/s field.

      2. - [X] Mandatory: Choose the type of documentation change or review.

      • [X] We need to update to an existing topic
      • [ ] We need to add a new document to an existing section
      • [ ] We need a whole new section; this is a function not
        documented before and doesn't belong in any current section
      • [ ] We need an Operator Advisory review and approval
      • [ ] We need a z-Stream (Errata) Advisory and Release note
        for MCE and/or ACM

      3. - [X] *Mandatory: *Use the following link to open the doc and find where the
      documentation update should go. Note: As the feature and doc is
      understood and developed, this placement decision may change:

      Update should go for all versions under Observability documentation, in "Customize observability configuration" > "Adding custom metrics" aka in observability/customize_observability.adoc

      4. - [ ] Mandatory for GA content:

      • [ ] Add steps, the diff, known issue, and/or other important
        conceptual information in the following space:
      • [ ] *Add Required access level *(example, *Cluster
        Administrator*) for the user to complete the task:
      • [ ] Add verification at the end of the task, how does the user
        verify success (a command to run or a result to see?)
      • [ ] Add link to dev story here:

      5. - [X] Mandatory for bugs: What is the diff? Clearly define what the
      problem is, what the change is, and link to the current documentation. Only
      use this for a documentation bug.

      There are several problems with the documentation. In agreement with engineering, here is my proposition for entirely replacing the chapter. I tried to keep the format to match what was in observability/customize_observability.adoc while using the data I published in the article https://access.redhat.com/solutions/7099641

      adding-custom-metrics
      == Adding custom metrics

      To monitor metrics from a remote cluster using RHACM, you first need to know if the metric is being exported as a `platform` or a `user workload` metric. This should be documented for the solution you want to monitor or be stomething support for that product should be able to tell you.
      +
      If information on how to monitor your solution is not available, you can identify the type of metric by looking at the console of the cluster, under `Observe > Metrics` the `prometheus` column should show what it originates from ; `user workload` metrics are identified as `openshift-user-workload-monitoring` while `platform` metrics would be listed as what provides them.
      You may also look at the `ServiceMonitor` for the observed resource and see which annotation it uses :

      • `operator.prometheus.io/controller-id: openshift-user-workload-monitoring/prometheus-operator` means this is `user workload`
      • `operator.prometheus.io/controller-id: openshift-platform-monitoring/prometheus-operator` means this is `platform`
        +
        After you know what type of metric you need to setup RHACM to monitor, follow the steps from the appropriate documentation.

      adding-platform-metrics
      === Adding Platform metrics

      Platform metrics can be monitored by creating a `ConfigMap` on the hub cluster in the `open-cluster-management-observability` namespace named `observability-metrics-custom-allowlist`. It needs to be formed as in this example:

      +
      [source,yaml]


      kind: ConfigMap
      apiVersion: v1
      metadata:
      name: observability-metrics-custom-allowlist
      namespace: open-cluster-management-observability
      data:
      metrics_list.yaml: |
      names: <1>

      • node_memory_MemTotal_bytes
        rules: <2>
      • record: apiserver_request_duration_seconds:histogram_quantile_90
        expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket {job=\"apiserver\", verb!=\"WATCH\"}[5m])) by (verb,le))
        ----
        +
        <1> Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
        <2> Optional: Enter only one value for the `expr` and `record` parameter pair to define the query expression. The metrics are collected as the name that is defined in the `record` parameter from your managed cluster. The metric value returned are the results after you run the query expression.
        +
        You can use either one or both of the sections.
        +
        This will apply to every cluster with monitor enabled. If you want to specifically use this configuration for only one cluster, you can instead use a similar configuration directly on the spoke cluster in the same namespace the `endpoint-observability-operator` is deployed, `open-cluster-management-addon-monitoring` :
        +
        [source,yaml]
        ----
        kind: ConfigMap
        apiVersion: v1
        metadata:
        name: observability-metrics-custom-allowlist
        namespace: open-cluster-management-addon-observability
        data:
        metrics_list.yaml: |
        names: <1>
        - node_memory_MemTotal_bytes
        rules: <2>
        - record: apiserver_request_duration_seconds:histogram_quantile_90
        expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver", verb!="WATCH"}

        [5m])) by (verb,le))


        +
        <1> Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
        <2> Optional: Enter only one value for the `expr` and `record` parameter pair to define the query expression. The metrics are collected as the name that is defined in the `record` parameter from your managed cluster. The metric value returned are the results after you run the query expression.
        +
        You can use either one or both of the sections.

      adding-user-workload-metrics
      === Adding user workload metrics

      For this type of metric, configuration is performed by a different collector. You need to set configuration on the spoke cluster itself in the `namespace where the metric has to be captured`. It needs to be named `observability-metrics-custom-allowlist` and can be formated as follows :
      +
      [source,yaml]


      kind: ConfigMap
      apiVersion: v1
      metadata:
      name: observability-metrics-custom-allowlist
      namespace: monitored_namespace <1>
      data:
      uwl_metrics_list.yaml: <2>
      names: <3>

      • sample_metrics

        +
        <1> Enter the namespace where the metric is captured from?
        <2> Enter the key for the config map data.
        <3> Enter the value of the config map data in YAML format. The `names` section includes the list of metric names, which you want to collect from the `test` namespace. After you create the config map, the observability collector collects and pushes the metrics from the target namespace to the hub cluster.
        +
        This example monitors the user workload metric `sample_metrics` from the namespace `monitored_namespace`. If this configuration is instead created in the `open-cluster-management-addon-monitoring` namespace, it would be collected from *all* the namespaces of the spoke cluster.

              rh-ee-ofischer Oliver Fischer
              rhn-support-fdewaley Felix Dewaleyne
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: