Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-4073

Collect accelerator metrics with OCP monitoring

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • Accelerator cards inventory metrics
    • False
    • None
    • False
    • Not Selected
    • NEW
    • To Do
    • NEW
    • 100% To Do, 0% In Progress, 0% Done

      Proposed title of this feature request

      Collect accelerator metrics in OCP

      What is the nature and description of the request?

      With the rise of OpenShift AI, there's a need to collect metrics about accelerator cards (including but not limited to GPUs). It should require no to little configuration from the customers and we recommend deploying a custom text collector with node_exporter.

      Why does the customer need this? (List the business requirements)

      Display inventory data about accelerators in the OCP admin console (like we do for CPU, memory, ... in the Overview page).

      Better understanding of which accelerators are used (Telemetry requirement).

      List any affected packages or components.

      node_exporter

      CMO

            spasquie@redhat.com Simon Pasquier
            spasquie@redhat.com Simon Pasquier
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: