Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4036

RHEL AI 1.5 model serve fails on AMD

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • rhelai-1.5
    • vLLM
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Critical
    • Proposed

      Deployed rhel-ai-amd-azure-1.5-1745891847-x86_64.vhd.gz on Azure Standard_ND96is_MI300X_v5.

       

      Applied workarounds: 

      https://issues.redhat.com/browse/RHELAI-4033

      https://issues.redhat.com/browse/RHELAI-4034

       

      Model serve tracebacks with (full log attached):

      (VllmWorkerProcess pid=79) ERROR 04-29 14:34:20 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks

              prarit@redhat.com Prarit Bhargava
              fzatlouk@redhat.com František Zatloukal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: