Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-2209

Metrics collection profiles

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • [Investigation] ability to tune what is collected by the in-cluster monitoring stack
    • False
    • False
    • NEW
    • To Do
    • Impediment
    • NEW
    • 0
    • 0% 0%
    • 0

      Epic Goal

      • Investigate options to reduce the amount of metrics collected by the platform monitoring stack.

      Why is this important?

      • The monitoring stack is one of the top contributors when it comes to CPU and RAM consumption.
      • It can be  a challenge for users to scale the stack (e.g. Prometheus would only scale vertically).
      • A significant fraction of the collected data isn't actively used (e.g. metrics that aren't leveraged by alerting/recording rules, telemetry, dashboards).
      • Customers that run many clusters (telco edge clusters for instance) want to collect and forward operational metrics to a central location. Given bandwidth constraints, they don't need to collect everything locally.

      Scenarios

      1. As an OpenShift monitoring developer, I want to quantify how much CPU/RAM would be saved when the monitoring stack collects only the metrics that have an operational interest (e.g. metrics used for alerting, telemetry and dashboarding).
      2. As an OpenShift monitoring developer, I need to

      Acceptance Criteria

      • Document detailing the potential resource savings.
      • Design document explaining how it could be implemented in practice (probably an OpenShift enhancement proposal)

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. MON-1671 (investigate dropped metrics & resource savings)
      2. MON-1672 (investigate lower res metrics & resource savings)

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            spasquie@redhat.com Simon Pasquier
            spasquie@redhat.com Simon Pasquier
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: