Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-5636

Update configuration script for instances with NVSwitch

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • AIPCC Productization
    • False
    • Hide

      None

      Show
      None
    • False

      Automated smoke tests for CUDA cannot be executed on the following instance types in AWS because they use NVSwitch and require further configuration.

      • p5.48xlarge
      • p5e.48xlarge
      • p5en.48xlarge
      • p4d.24xlarge
      • p4de.24xlarge

      Acceptance criteria:

      • RHAIIS container tests run successfully on these instance types
      • Tests use all available GPUs
      • At least two models work, e.g. ibm-granite/granite-3.3-8b-instruct and meta-llama/Meta-Llama-3-8B-Instruct

      Notes
      https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html#nvidia-server-architectures

              Unassigned Unassigned
              ppitonak Pavol Pitoňák
              Klara's Team
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: