Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-4073

Collect accelerator metrics with OCP monitoring

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • Accelerator cards inventory metrics
    • False
    • None
    • False
    • Not Selected
    • NEW
    • To Do
    • NEW
    • 0% To Do, 0% In Progress, 100% Done
    • NA
    • Release Note Not Required

      Proposed title of this feature request

      Collect accelerator metrics in OCP

      What is the nature and description of the request?

      With the rise of OpenShift AI, there's a need to collect metrics about accelerator cards (including but not limited to GPUs). It should require no to little configuration from the customers and we recommend deploying a custom text collector with node_exporter.

      Why does the customer need this? (List the business requirements)

      Display inventory data about accelerators in the OCP admin console (like we do for CPU, memory, ... in the Overview page).

      Better understanding of which accelerators are used (Telemetry requirement).

      List any affected packages or components.

      node_exporter

      CMO

              spasquie@redhat.com Simon Pasquier
              spasquie@redhat.com Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: