Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-3070

Search should expose usage metrics (search-indexer)

XMLWordPrintable

    • 5
    • False
    • None
    • True
    • Hide

      Provide the required acceptance criteria using this template.
      * ...
      Show
      Provide the required acceptance criteria using this template. * ...
    • ACM-59 - Drive cloud services investment from telemetry data
    • Observability Sprint 2023-04
    • No

      Value Statement

      Exposing metrics from the search components will help with debugging and SLOs.

      Definition of Done for Engineering Story Owner (Checklist)

      Reference: https://sre.google/sre-book/monitoring-distributed-systems/

      Metrics to collect and expose from search-indexer:

      • Time it takes to complete a the request from managed cluster. (latency)
        • Histogram.
        • Full re-sync 0 to 10,000 resources
        • Full re-sync 10,001 to 25,000 resources
        • Full re-sync 25,001 to 100,000 resources
        • Full re-sync over 100,001 resources
        • Delta sync 0 to 100 resources
        • Delta sync over 101 resources.
      • Total current requests count. (saturation)
        • Counter that increases and decreases as requests come and get resolved.
        • This may change too quickly because a typical request takes <1 second. Will need to figure out what is a useful sampling.
        • Alternative we could infer this if we have the request start and end times.
      • Count rejected requests from Managed Clusters. (saturation/errors)
        • We may already have this from the http response code = 429
      • Count errors responses (errors)
        • Use http response code.
      • Retry count from managed cluster.
        • We can infer this from the rejected request counts.
      • DB connection counts
        • Not relevant enough, the pool will use all connections available.

       

      • Document all currently exposed metrics in README.
      • Document the metrics in the metrics-chronicole repo.

      Development Complete

      • The code is complete.
      • Functionality is working.
      • Any required downstream Docker file changes are made.

      Tests Automated

      • [ ] Unit/function tests have been automated and incorporated into the
        build.
      • [ ] 100% automated unit/function test coverage for new or changed APIs.

      Secure Design

      • [ ] Security has been assessed and incorporated into your threat model.

      Multidisciplinary Teams Readiness

      Support Readiness

      • [ ] The must-gather script has been updated.

              jpadilla@redhat.com Jorge Padilla
              jpadilla@redhat.com Jorge Padilla
              Xiang Yin Xiang Yin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: