Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3248

Gaudi: vLLM v0.6.4.post2 inference fails for 1.4-dev-freeze

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Important
    • Approved

      To Reproduce Steps to reproduce the behavior:

      Jira won't allow below list to be numbered correctly. Please take this as a numbered list.

      1.  
      podman pull registry.stage.redhat.io/rhelai1/instructlab-intel-rhel9:1.4-1738240991
      1.  
      podman run -it --hooks-dir /tmp --device /dev/accel --device /dev/infiniband instructlab-intel-rhel9:1.4-1738240991 /bin/bash
      
      1. (app-root)$ PT_HPU_LAZY_MODE=1 PT_HPU_ENABLE_LAZY_COLLECTIVES=true vllm serve instructlab/granite-7b-lab --dtype bfloat16 --distributed-executor-backend mp -tp 4
      1. (app-root)$ vllm chat # in a separate tmux or something

      Expected behavior

      • vLLM server should begin boot process, go through its warmup, and start serving model.
      • User should be able to chat with model without it failing.

      Device Info (please complete the following information):

      • Hardware Specs: 
        • 8xGaudi 3
      • OS Version: Ubuntu 22.04
      • InstructLab Version: 0.23.1
      • Machine was returned to SMC before hardware dump could be recorded for this bug report.

      Bug impact

      • Inference via vLLM is not functioning in this image. Users will not be able to serve models from their Gaudi 3 machines via vLLM.

      Known workaround

      • None

      Additional context

      • The version of vLLM present in this image should be the released version v0.6.4 plus a commit that enables the MP backend for distributed inference, using >1 Gaudi 3 device.
      • In the TP=1 case, vLLM seems to behave correctly. The attached logs reflect that.
      • In the TP=2 case, vLLM boots, but cannot handle an incoming request.
      • In the TP=4 case, vLLM cannot boot.

        1. vllm-tp1-granite7blab.txt
          77 kB
          James Kunstle
        2. vllm-tp2-granite7blab.txt
          268 kB
          James Kunstle
        3. vllm-tp4-granite7blab.txt
          1.64 MB
          James Kunstle

              fjansen@redhat.com Frank Jansen
              rhn-support-jkunstle James Kunstle (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: