-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
-
False
-
Not Selected
-
Observability
1. Proposed title of this feature request
GPU Metrics in RHACM Observability
2. What is the nature and description of the request?
Customers are using OpenShift Container Platform for AI / ML workloads. As a result of that, customers are using GPUs in OpenShift Container Platform Worker Nodes to accelerate certain workload.
Using our GPU Operator, customers automatically get access to GPU metrics, for example via the NVIDIA DCGM Exporter: https://github.com/NVIDIA/dcgm-exporter / https://docs.nvidia.com/datacenter/dcgm/latest/dcgm-api/dcgm-api-field-ids.html
This request asks for these metrics to be sent to RHACM as well so customers can centrally view and manage their GPU resources. Should metrics be present, RHACM could also display a dashboard for these metrics.
3. Why does the customer need this? (List the business requirements here)
Customers are using OpenShift Container Platform for AI / ML workloads. To better utilise resources and to better understand general GPU usage, these metrics should be available centrally in RHACM.
4. List any affected packages or components.
RHACM Observability
- relates to
-
ACM-15959 GPU Dashboards for OpenShift Virtualization in RHACM
-
- New
-