-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
-
AP Sprint 9
-
Critical
vLLM 0.9.0.1 is not compatible with flashinfer-python 0.2.6.post1. The midstream releases use flashinfer-python 0.2.5 compiled with FLASHINFER_ENABLE_SM90=1 and FLASHINFER_ENABLE_AOT=1. flashinfer-python 0.2.6.post1 and builds of flashinfer-python 0.2.5. without FLASHINFER_ENABLE_SM90 lack support for fgmma.fence with sm90a arch (Hopper).
https://github.com/vllm-project/vllm/blob/v0.9.0.1/docker/Dockerfile#L262
if [[ "$CUDA_VERSION" == 12.8* ]]; then \ uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \ else
Solution: Downgrade from 0.2.6.post1 to 0.2.5 and enable the SM90 flag. This should fix the issue
Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED