-
Epic
-
Resolution: Won't Do
-
Normal
-
None
-
None
-
None
-
GPU Sizing Guidance for deep learning workloads
-
Inference, RHELAI, RHOAI, Training
-
Not Selected
-
False
-
False
-
None
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
- Come up with easy to consume guidelines for our customers on number of GPUs, and GPU family to use for certain class of deep learning workloads
- Focus on Nvidia GPUs
Why is this important?
- Need to simplify and generalize the product performance guide published by Nvidia for deep learning performance - https://developer.nvidia.com/deep-learning-performance-training-inference
Scenarios
- Customers want to understand what kind and how many GPUs should they purchase depending on their AI/ML workload requirements
Acceptance Criteria
- Whitepaper with guidelines
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>