-
Story
-
Resolution: Done
-
Undefined
-
None
-
False
-
False
-
*This story will be tied to an Epic that is still to come*
For the long term GPU story we want to add a dedicate overview page to the console.
Requirements
**Determine where this page would live
Align with the current OCP dashboards and their cards but also consider other cards that aren't currently implemented in the current overview pages.
Possible data points
- GPU Allocation
- Percentage of GPU used on the cluster
- Number of GPUs
- Total number
- Used GPU
- Idled GPUs
- FrameBuffer Memory
- Total GB
- Used GB
- Idled GB
- Power Consumption
- Total watts
- Used watts
- Idled watts
- Maximum number of jobs that can be scheduled
- Number of physical GPUs used per worker nodes
- Jobs tracking
- Jobs with idle GPU
- Jobs with an Error
- Jobs with a Long Duration
- Job Queue
- Node Downtime
- GPU utilization history
- GPU use per use
- GPU use per project
- Physical tracking
- Average temperature
- Maximum temperature of all GPUs from the cluster
- Minimum server fan speed of all GPUs from the cluster