Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4174

Enable vLLM v1 engine for Nvidia accelerator

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Goal: 

      As a user of RHEL AI, I want the best possible performance and throughput out of my inference server.

      vLLM's v1 engine is a flagship feature of recent vLLM releases, but we hit a bug late in the RHEL AI 1.5 cycle (RHELAI-4084) that required us to disable vLLM v1 for Nvidia accelerators. That bug fix has been merged upstream at https://github.com/vllm-project/vllm/pull/17855 but as of creating this issue has not yet been released.

      Once that gets into a vLLM release (likely 0.8.6 or later), we need to test reverting the swap to vLLM v0 (ie reverting https://gitlab.com/redhat/rhel-ai/containers/instructlab-nvidia/-/merge_requests/600) and ensure that inference and specifically SDG work with the v1 engine.

       

      Acceptance Criteria:

      • The instructlab-nvidia container does not disable vLLM v1 via an environment variable.
      • Serving models via `ilab model serve` uses the vLLM v1 engine and works with our supported models (granite variants, mixtral w/ adapters, prometheus).
      • `ilab data generate` completes successfully in the vLLM v1 engine with our default agentic pipeline that runs against the mixtral teacher model with skills/knowledge adapters

              Unassigned Unassigned
              bbrownin@redhat.com Ben Browning
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: