-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Integration of GPU metrics exporter deployment
-
True
-
-
False
-
-
Not Selected
-
?
-
?
-
In Progress
-
RHOSSTRAT-1074 - Extension of edpm-ansible with GPU specific software
-
?
-
rhos-workloads-vaf
-
?
-
86% To Do, 14% In Progress, 0% Done
-
-
-
Goal:
- To provide Ansible playbook (and)or role in edpm-ansible for deployment of GPU metrics exporter
Acceptance Criteria:
- Patch containing Ansible playbook/role for the exporter deployment is part of downstream edpm-ansible
Open question:
- Is the utilization metrics suitable for Watcher needs?
- Can we downstream libnvidia-ml?
- If not - is it okay if we install it from nvidia's public repo?
dcgm-exporter runs in a container, but requires libnvidia-ml and container toolkit RPMs installed on the host (EDPM node). The container toolkit is responsible for mapping driver and management libraries into the container at runtime to provide access to the hardware from inside the container.