Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3902

Document support for NVIDIA MIG in RHEL AI

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Accelerators - NVIDIA
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Release Notes

      RHEL AI can support NVIDIA MIG (Multi-Instance GPU) devices when containers are run using podman and the nvidia-container-runtime. This setup does not require Podman to be natively CDI-enabled, as MIG instances can be specified explicitly using environment variables.

      We should document this capability and provide guidance for end-users.

      Scope:
      Document how to use NVIDIA_VISIBLE_DEVICES with MIG UUIDs or CDI device names for inferencing in Podman-based RHEL AI environments.
      Include examples like:

      podman run --runtime=nvidia \
        -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=mig1:0 \
        <image> nvidia-smi -L
      

      Reference official NVIDIA documentation:
      https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#using-cdi-with-non-cdi-enabled-runtimes

      Note that

      CUDA_VISIBLE_DEVICES
      

      will also respect MIG UUIDs or CDI names inside the container, enabling compatible CUDA-based applications.

      Consider updating ilab to preserve the user-specified NVIDIA_VISIBLE_DEVICES environment variable when launching inference jobs.

              fdupont@redhat.com Fabien Dupont
              rh-ee-raravind Reshmi Aravind
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: