Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-3179

vLLM 0.9.0.1 needs flashinfer-python 0.2.5 with FLASHINFER_ENABLE_SM90=1

    • False
    • Hide

      None

      Show
      None
    • False
    • AP Sprint 9
    • Critical

      vLLM 0.9.0.1 is not compatible with flashinfer-python 0.2.6.post1. The midstream releases use flashinfer-python 0.2.5 compiled with FLASHINFER_ENABLE_SM90=1 and FLASHINFER_ENABLE_AOT=1. flashinfer-python 0.2.6.post1 and builds of flashinfer-python 0.2.5. without FLASHINFER_ENABLE_SM90 lack support for fgmma.fence with sm90a arch (Hopper).

      https://github.com/vllm-project/vllm/blob/v0.9.0.1/docker/Dockerfile#L262

          if [[ "$CUDA_VERSION" == 12.8* ]]; then \
              uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
          else
      

      Solution: Downgrade from 0.2.6.post1 to 0.2.5 and enable the SM90 flag. This should fix the issue

      Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED

              cheimes@redhat.com Christian Heimes
              cheimes@redhat.com Christian Heimes
              Antonio's Team
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: