Epic Goal
- Add Google Cloud's NVIDIA H100 instance to the "Tested instance types for GCP" section in the Documentation from 4.12+
Why is this important?
- This is a new GPU-enabled Machine Type from Google Cloud that customers are planning to use and customers need to ensure we have validated this Machine Type as compute Nodes for OCP
Scenarios
- The A3 machine series (as of today only a3-highgpu-8g is available) are highlighted in the OpenShift Container Platform as a "Tested instance type"
Previous Work (Optional):
- The instance has been already validated via NVIDIA-82 where GPU Operators have been validated as well.
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is cloned by
-
CORS-3307 List GCP's NVIDIA A100 instances as tested instance type
- Closed
- is related to
-
OCPBUGS-34638 [GCP NVIDIA H100] "destroy cluster" will hang at "VM has a Local SSD attached but an undefined value for 'discard-local-ssd'" when trying to stop the A3 instance
- Closed