Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-2101

vLLM FA fix for CUTE_ARCH_MMA_SM90A_ENABLED is flawed

    • False
    • Hide

      None

      Show
      None
    • False
    • Critical

      AIPCC-1634 attempted to fix the problem

      Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED

      and added the env var "VLLM_FA_CMAKE_GPU_ARCHES" to the builder repo. The fix has a flaw that breaks the fix. The build argument was defined in the build args file as:

      VLLM_FA_CMAKE_GPU_ARCHES='80-real;90-real'

      However Podman does not strip quotes from build args. The line should be

      VLLM_FA_CMAKE_GPU_ARCHES=80-real;90-real

       

      From the bootstrap.log of the most recent vLLM build for CUDA https://gitlab.com/redhat/rhel-ai/rhaiis/pipeline/-/jobs/10415882372/artifacts/browse/mnt/

      2025-06-20 14:54:13,203 DEBUG:fromager.external_commands:73: vllm: running: _PYPROJECT_HOOKS_BUILD_BACKEND=setuptools.build_meta MAKEFLAGS=-j22 CMAKE_BUILD_PARALLEL_LEVEL=22 MAX_JOBS=22 CMAKE_BUILD_TYPE=
      Release VLLM_INSTALL_PUNICA_KERNELS=1 VLLM_CUTLASS_SRC_DIR=/work/git-repos/cutlass GPU_ARCHS='7.5 8.0 8.6 8.7 8.9 9.0 10.0 12.0+PTX' INCLUDES='-I/work/git-repos/cutlass/include -I/work/git-repos/cutlass/
      tools/util/include' NVCC_THREADS=2 VLLM_FLASH_ATTN_SRC_DIR=/work/git-repos/vllm_flash_attn FLASH_MLA_SRC_DIR=/work/git-repos/vllm_flash_mla VLLM_TARGET_DEVICE=cuda VLLM_FA_CMAKE_GPU_ARCHES=''"'"'80-real;
      90-real'"'"'' /opt/app-root/lib64/python3.12/site-packages/fromager/run_network_isolation.sh /opt/app-root/bin/python3.12 /opt/app-root/lib64/python3.12/site-packages/pyproject_hooks/_in_process/_in_proc
      ess.py get_requires_for_build_wheel /tmp/tmp3g2sf29b in /mnt/work-dir/vllm-0.9.0.1/vllm-0.9.0.1
      VLLM_FA_CMAKE_GPU_ARCHES=''"'"'80-real; 90-real'"'"''
      

              cheimes@redhat.com Christian Heimes
              cheimes@redhat.com Christian Heimes
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: