Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2088

Azure - Add support for NVIDIA H100 and H200 enabled machine series

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • 0% To Do, 100% In Progress, 0% Done
    • False
    • None
    • False
    • M
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview (aka. Goal Summary)  

      Add support to NVIDIA H100 and H200 enabled machine series to be used on OpenShift deployment in Azure

      Goals (aka. expected user outcomes)

      Support OpenShift to be deployed in Azure in the following machine series:

      • ND-H100-v5
        • Standard_ND96isr_H100_v5
      • NCads_H100_v5
        • Standard_NC40ads_H100_v5
        • Standard_NC80adis_H100_v5
      • NCCads_H100_v5
        • Standard_NCC40ads_H100_v5
      • ND-H200-v5
        • Standard_ND96isr_H200_v5

      Requirements (aka. Acceptance Criteria):

      All these machine series can be selected at install time to be used to deploy OpenShift on Azure

       

      Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

      Deployment considerations List applicable specific needs (N/A = not applicable)
      Self-managed, managed, or both  
      Classic (standalone cluster)  
      Hosted control planes  
      Multi node, Compact (three node), or Single node (SNO), or all  
      Connected / Restricted Network  
      Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
      Operator compatibility  
      Backport needed (list applicable versions)  
      UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
      Other (please specify)  

      Background

      Customers demand to run AI-enabled workloads in the cloud keeps increasing. To be able to support our customers we need to enable the latest GPUs available in the market

      Documentation Considerations

      Usual documentation to list these machine series as tested 

      Interoperability Considerations

      This feature will be consumed by ARO later

              linnguye.openshift Linh Nguyen
              mak.redhat.com Marcos Entenza Garcia
              None
              Erwan Gallen, Jerome Boutaud, Oren Kashi
              Patrick Dillon Patrick Dillon
              Jinyun Ma Jinyun Ma
              Avani Bhatt Avani Bhatt
              Derrick Ornelas Derrick Ornelas
              Erwan Gallen Erwan Gallen
              Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: