-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
False
-
None
-
False
-
Not Selected
-
0
-
OBSDA-914OpenTelemetry Integrations
-
100% To Do, 0% In Progress, 0% Done
Proposed title of this feature request
OpenTelemetry GPU metrics integration
What is the nature and description of the request?
Provide a way to extract, store, visualize and export GPU usage telemetry data in various formats, including OpenTelemetry and Prometheus.
Why does the customer need this? (List the business requirements)
Data scientists who run workloads in the cloud, need to obtain GPU data to control costs. This data is not available today. Even if we make it available, customers want to leverage on OpenShift to send data to many platforms, on prem, self-supported and/or observability solutions. That's why, the business requirements of this feature are:
- Provide a way to read, transform and store GPU data in both OTLP and Prometheus formats
- Provide a dashboard in the OpenShift console related to relevant GPU info
- Document this integration as part of the integrations framework of this outcome: OBSDA-914
List any affected packages or components.
- Red Hat build of OpenTelemetry
Further reading
- https://opentelemetry.io/docs/specs/semconv/system/hardware-metrics/#hwgpu---gpu-metrics
- https://github.com/openlit/openlit/tree/main/otel-gpu-collector/
- https://github.com/NVIDIA/gpu-operator
- https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html which includes OpenShift doc
- https://github.com/utkuozdemir/nvidia_gpu_exporter
- https://community.ibm.com/community/user/instana/blogs/yanwei-li/2024/06/14/gpu-observability-with-instana