Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4114

ilab profile for "NVIDIA H200 X4" is misnamed as "NVIDIA H100 X8"

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      To Reproduce Steps to reproduce the behavior:

      1. Run `ilab config init` on unrecognized hardware
      2. Select "NVIDIA"
      3. Observe the misnamed profile for #11
      Please choose a system profile.
      Profiles set hardware-specific defaults for all commands and sections of the configuration.
      First, please select the hardware vendor your system falls into
      [0] NO SYSTEM PROFILE
      [1] NVIDIA
      Enter the number of your choice [0]: 1
      You selected: NVIDIA
      Next, please select the specific hardware configuration that most closely matches your system.
      [0] NO SYSTEM PROFILE
      [1] NVIDIA L4 X8
      [2] NVIDIA L40S X4
      [3] NVIDIA L40S X8
      [4] NVIDIA H100 X4
      [5] NVIDIA H100 X2
      [6] NVIDIA H100 X8
      [7] NVIDIA A100 X4
      [8] NVIDIA A100 X2
      [9] NVIDIA A100 X8
      [10] NVIDIA H200 X8
      [11] NVIDIA H100 X8
      [12] NVIDIA H200 X2
      [13] NVIDIA H200 X1
      Enter the number of your choice [hit enter for hardware defaults] [0]
      

      4. View .local/share/instructlab/internal/system_profiles/nvidia/h200/h200_x4.yaml . You will see that the metadata section is incorrect, while every other section seems correct:

      metadata:
        gpu_manufacturer: Nvidia
        gpu_family: H100
        gpu_count: 8
        gpu_sku: [NVL, PCIe]
      
      

       

      Device Info (please complete the following information):

      • Hardware Specs: An AWS or EC2 instance with hardware not recognized. Such as dual NVIDIA L40S (IBM Cloud: gx3-48x240x2l40s)
      • OS Version: RHEL AI staging 1.5-5
      • InstructLab Version:  0.26.0
      • registry.stage.redhat.io/rhelai1/bootc-nvidia-rhel9:1.5
      •  

      Bug impact

      • Clearly visible typo whenever a user has to select their hardware.
      • If a user has H200 X4 and must use the selector, they are likely to not select it.
      • Whether or not H200 X4 actually works or not has not been tested.

              cdoern@redhat.com Charles Doern
              mdepaulo@redhat.com Mike DePaulo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: