Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3627

Llama 3.3 70B RHELAI vllm inference flow

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • RHELAI-3557RHEL AI Third-Party Model Validation Deliverables for Summit '25

      Feature Overview:
      This Feature card is part of validating 3rd-party inference models in vllm inference flow for RHELAI 1.5. This is separate from the ilab model serve inference validation.

      3rd-party model for this card: Llama 3.3 70B Instruct 

      Goals:

      • Serve Llama 3.3 70B with vllm in RHELAI 1.5 - functional test
        • Chat with it to confirm it functions
        • No errors/warnings arise
      • Start documentation for MVP vllm inferencing on RHELAI 1.5 
      • Run for all quantized variants of the model (Base, INT4, INT8, FP8)

       Out of Scope [To be updated post-refinement]:

      • Ilab model serve functional testing, this is a separate endeavor

      Requirements:

      • Documentation to be updated to reflect the workaround for directly deploying vllm for inferencing on RHELAI
        • Specifying the entrypoint command to run vllm when the container starts: ie. for Podman
        • Specifying how models are downloaded and referenced in the command to be served
        • ENTRYPOINT=/opt/app-root/bin/vllm
          IMAGE=registry.redhat.io/rhelai1/instructlab-nvidia-rhel9:1.4-1738905416
          
          podman run --rm -ti \
          --device "nvidia.com/gpu=all" \
          --security-opt "label=disable" \
          --net host \
          --shm-size 10G \
          --pids-limit -1 \
          -v $HOME:$HOME \
          --entrypoint $ENTRYPOINT \
          $IMAGE \
          serve ~/models/a57d425d-80c2-4361-bbf7-23f1262ceea1 --served-model-name wcabanba0308sves-mlang-skill --host 127.0.0.1 --port 8000 
      • All base and quantized variants of the model are able to be served via this workaround

      Done - Acceptance Criteria:

      • Documentation is updated
      • All base and quantized versions of the Model are confirmed to meet the requirements and all have a 'X' in the Confirmed boxes

      Use Cases - i.e. User Experience & Workflow:

      • User downloads the model via quay or huggingface
      • User serves the model following the documentation to bypass other components of the RHELAI container and serves directly on vllm

      Documentation Considerations:{}

      • See requirements

      Questions to answer:

      • Which vllm version is in the 1.4 vs what is planned for 1.5?

      Background & Strategic Fit:

      Customers have been asking to leverage the latest and greatest third-party models from Meta, Mistral, Microsoft, Qwen, etc. within Red Hat AI Products. As our they continue to adopt and deploy OS models, the third-party model validation pipeline provides inference performance benchmarking and accuracy evaluations for third-party models to give customers confidence and predictability bringing third-party models to Instruct Lab and vLLM within RHEL AI and RHOAI.

      See Red Hat AI Model Validation Strategy Doc

      See Redhat Q1 2025 Third Party Model Validation Presentation

              rh-ee-rogreenb Rob Greenberg
              rh-ee-rogreenb Rob Greenberg
              Jenny Yi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: