Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-8498

New feature: Compactor alerts for Multicluster Observability

XMLWordPrintable

    • False
    • None
    • False
    • No

      Create an informative issue (See each section, incomplete templates/issues won't be triaged)

      Using the current documentation as a model, please complete the issue template. 

      Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.

      Prerequisite: Start with what we have

      Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:

       - Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes

       - Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs 

      Describe the changes in the doc and link to your dev story

      Provide info for the following steps:

      1. - [x] Mandatory Add the required version to the Fix version/s field.

      2. - [x] Mandatory Choose the type of documentation change.

            - [x] New topic in an existing section or new section:

       

      I believe this should go in the release notes for ACM Obs 2.9 at https://github.com/stolostron/rhacm-docs/blob/2.9_stage/release_notes/whats_new.adoc#observability.

            - [ ] Update to an existing topic

      3. - [x] Mandatory for GA content:
                  
             - [x] Add steps and/or other important conceptual information here: 

      The Thanos Compactor is an important part of the ACM Observability product. It's deployed by the Multicluster Observability Operator (MCO). Its job is to ensure that queries will perform well. This is achieved through enforcement of the retention configuration and compaction of the data in storage. For the MCO to provide a good query experience, it's essential that the Thanos Compactor is healthy. 

      To help customers identify when the Thanos Compactor has issues, the MCO now includes 4 default alerts that are monitoring its health, with different severities:

      • ACMThanosCompactHalted, critical: fires if the Compactor is halted.
      • ACMThanosCompactHighCompactionFailures, warning: fires if the compaction failure rate is > 5%.
      • ACMThanosCompactBucketHighOperationFailures, warning: fires if bucket operation failure rate is > 5%.
      • ACMThanosCompactHasNotRun, warning: fires if compactor has not uploaded anything in last 24 hours.

      More details about these rules can be found upstream at https://github.com/stolostron/multicluster-observability-operator/blob/main/operators/multiclusterobservability/manifests/base/alertmanager/prometheusrule.yaml.

               
             - [ ] Add Required access level for the user to complete the task here:
             

             - [ ] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)
           
           
             - [x] Add link to dev story here: https://issues.redhat.com/browse/ACM-7362

      4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:

            mdockery@redhat.com Mikela Jackson
            rh-ee-doolivei Douglas Camata (Inactive)
            Xiang Yin Xiang Yin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: