Uploaded image for project: 'Observability and Data Analysis Program'
  1. Observability and Data Analysis Program
  2. OBSDA-989

Collect accelerator metrics with OCP monitoring

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • PM Monitoring
    • None
    • False
    • None
    • False
    • Not Selected
    • 0

      Proposed title of this feature request

      Collect accelerator metrics in OCP

      What is the nature and description of the request?

      With the rise of OpenShift AI, there's a need to collect metrics about accelerator cards (including but not limited to GPUs). It should require no to little configuration from the customers and we recommend deploying a custom text collector with node_exporter.

      Why does the customer need this? (List the business requirements)

      Display inventory data about accelerators in the OCP admin console (like we do for CPU, memory, ... in the Overview page).

      Better understanding of which accelerators are used (Telemetry requirement).

      List any affected packages or components.

      node_exporter

      CMO

              rh-ee-rfloren Roger Florén
              spasquie@redhat.com Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: