Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-7321

Detect and alert when ACM compactor becomes unhealthy on the hub

XMLWordPrintable

    • Observability Sprint 2023-11, Observability Sprint 2023-15

      Value Statement

      Due to various reasons, the ACM compactor can become unhealthy and stop compactions. This could have disastrous consequences on the long term health of the system. The story is to alert OCP administrator when the compactor becomes unhealthy.

      The alerts are modeled after how RHOBS monitors compactor health here

      Specifically,

      ACMThanosCompactHalted, critical,  [5m], fires if compactor halted

      ACMThanosCompactHighCompactionFailures, warning, [15m], fires if the compaction failure rate is > 5%

      ACMThanosCompactBucketHighOperationFailures, warning, [15m], fires if bucket operation failure rate is > 5%

      ACMThanosCompactHasNotRun, warning, fires if compactor has not uploaded anything in last 24 hours.
                  It also delays its execution by 6 hours the first time the rule is added.
                 (4 hrs for receivers to create a block + 2 hours for compactor to run).

      jbanerje@redhat.com sberens@redhat.com - please review and comment.

      Definition of Done for Engineering Story Owner (Checklist)

      • ...

      Development Complete

      • The code is complete.
      • Functionality is working.
      • Any required downstream Docker file changes are made.

      Tests Automated

      • [ ] Unit/function tests have been automated and incorporated into the
        build.
      • [ ] 100% automated unit/function test coverage for new or changed APIs.

      Secure Design

      • [ ] Security has been assessed and incorporated into your threat model.

      Multidisciplinary Teams Readiness

      Support Readiness

      • [ ] The must-gather script has been updated.

            smeduri1@redhat.com Subbarao Meduri
            smeduri1@redhat.com Subbarao Meduri
            ChangLiang Qu ChangLiang Qu
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: