-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
None
Currently, ARO supports a set of GPU-enabled virtual machines, including NVIDIA T4 and A100-based instances. However, customers facing A100 capacity constraints in certain Azure regions have been offered H100 instances as an alternative by Microsoft.
Currently, H100 instances are not officially supported in OpenShift (OCP) and, therefore, not in ARO. Customers requiring high-performance GPU workloads have no officially supported option when A100 capacity is unavailable.
This RFE proposes adding support for H100-based instances, ensuring that customers running AI/ML workloads in ARO can access the latest generation of GPU-accelerated infrastructure.
Use case / business justification:
- Azure capacity constraints: In certain regions, A100 instances are not available, forcing customers to consider alternatives. H100 is the natural successor to A100.
- Performance advantage: H100 offers significantly higher compute capabilities compared to A100, making it ideal for next-generation AI/ML workloads.
- Competitive offering: AKS are offering H100-based solutions, and adding H100 support in ARO would help keep Azure competitive for GPU-heavy workloads.
- is triggering
-
OCPSTRAT-2088 Azure - Add support for NVIDIA H100 and H200 enabled machine series
-
- In Progress
-