Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-10032

[doc] ACM Hub Metrics Collection

XMLWordPrintable

    • False
    • None
    • False
    • No

      Create an informative issue (See each section, incomplete templates/issues won't be triaged)

      Using the current documentation as a model, please complete the issue template. 

      Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.

      Prerequisite: Start with what we have

      Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:

       - Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes

       - Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs 

      Describe the changes in the doc and link to your dev story

      Provide info for the following steps:

      1. - [x] Mandatory Add the required version to the Fix version/s field.

      2. - [x] Mandatory Choose the type of documentation change.

            - [x] New topic in an existing section or new section
            - [ ] Update to an existing topic

      3. - [ ] Mandatory for GA content:
                  
             - [x] Add steps and/or other important conceptual information here: 
             
      1. When Observability is configured and enabled, metrics from a ACM hub cluster are always collected and pushed into ACM Observability regardless of whether hub self-management is enabled or not. 
      2. Similarly, when alerting is enabled, alerts from ACM hub cluster are always propagated to ACM Alert Manager regardless of whether hub self-management is enabled or not
      3. Metrics and alerts for hub cluster continue to appear under local-cluster even though there is no managed cluster object with that name. 
      4. Customers will see managed cluster labels can pick spoke clusters using label filtering in ACM Grafana Overview dashboard page whether hub self-management is enabled or not.
      4a. Customers will see local-cluster (hub) in the cluster list drop-down only if hub self-management is enabled.
      4b. Customers can query local-cluster (hub) metrics via ACM Grafana explorer view.
      5. ACM Observability no longer appears as an a add-on in ACM console, as it is always ON.
      6. ACM customers migrating from previous releases will notice on the hub cluster:
          a) ACM add-on on the hub cluster is removed
          b) Corresponding pods in open-cluster-management-observability removed (endpoint-observability-operator and metrics-collector)
          c) The endpoint operator and metrics-collector are now launched and managed by directly by MCO operator in open-cluster-management-observability namespace. 

      7. NO ACTION FROM CUSTOMER IS NECESSARY to enable this feature. There is no regression in function. 

      8.  THIS FUNCTION AFFECTS ACM HUBS only. NO CHANGE IN BEHAVIOR OR FUNCTION FOR ANY SPOKE CLUSTERS.
                  
             - [ ] Add Required access level for the user to complete the task here:

             - [ ] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)
           a) ACM customers should  see managed cluster labels even when hub self management is disabled
          b) ACM customers should see metrics from local-cluster even when hub self management is disabled in explorer view
          c) ACM customers should see alerts from local-cluster even when hub self management is disabled

      9. metrics-collector and uwl-metrics-collector are scraped by the in-cluster CMO Prometheus, and expose the following metrics

        a) acm_(uwl_)metrics_collector_federate_requests_total, that represents the number of requests to federate from CMO Prometheus, with response status code as a label.

        b) acm_(uwl_)metrics_collector_forward_write_request_total, that represents the number of remote write requests sent to hub Observatorium API with response status code as a label

        c) certain other less important metrics like federate_samples, that provide a gauge of how many samples are being reported by the incluster Prometheus.

      10. metrics-collector and uwl-metrics-collector also have two alerts that fire locally on the cluster, and represent state of operation. ACM(UWL)MetricsCollectorFederationError which suggests that metrics-collector isn't able to federate from CMO properly and has high error rate, and ACM(UWL)MetricsCollectorForwardRemoteWriteError which suggests remote write error rate is high and metrics-collector cannot reach hub properly. All of these alerts are critical severity.

           
             - [ ] Add link to dev story here:
      https://issues.redhat.com/browse/ACM-8509

      4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:

            rh-ee-ofischer Oliver Fischer
            smeduri1@redhat.com Subbarao Meduri
            Daniel Mohr Daniel Mohr
            Subbarao Meduri Subbarao Meduri
            Coleen Iona Quadros
            ACM Observability Core
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: