-
Bug
-
Resolution: Unresolved
-
Major
-
RHELAI 1.3 GA
The H100 profiles do not match the expected code to file name translation logic since this card reports as:
_CudaDeviceProperties(name=‘NVIDIA H100 80GB HBM3’, major=9, minor=0, total_memory=80994MB, multi_processor_count=132)
There are several H100 variants:
0x2321, 0x1839, 0x10de, "NVIDIA H100 NVL"
0x2330, 0x16c0, 0x10de, "NVIDIA H100 80GB HBM3"
0x2330, 0x16c1, 0x10de, "NVIDIA H100 80GB HBM3"
0x2331, 0x1626, 0x10de, "NVIDIA H100 PCIe"
0x2339, 0x17fc, 0x10de, "NVIDIA H100"
A full list of all Nvidia card identifiers can be obtained in the open drivers (scroll past the nulls):
https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/generated/g_nv_name_released.h
- clones
-
RHELAI-2387 H100 systems are not correctly identified by auto detection
- Closed