Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: RHAIIS-3.1
Affects Version/s: None
Component/s: Development Platform
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
AP Sprint 9
Severity:
Critical

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

vLLM 0.9.0.1 is not compatible with flashinfer-python 0.2.6.post1. The midstream releases use flashinfer-python 0.2.5 compiled with FLASHINFER_ENABLE_SM90=1 and FLASHINFER_ENABLE_AOT=1. flashinfer-python 0.2.6.post1 and builds of flashinfer-python 0.2.5. without FLASHINFER_ENABLE_SM90 lack support for fgmma.fence with sm90a arch (Hopper).

https://github.com/vllm-project/vllm/blob/v0.9.0.1/docker/Dockerfile#L262

    if [[ "$CUDA_VERSION" == 12.8* ]]; then \
        uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
    else

Solution: Downgrade from 0.2.6.post1 to 0.2.5 and enable the SM90 flag. This should fix the issue

Attempting to use wgmma.fence without CUTE_ARCH_MMA_SM90A_ENABLED

mentioned on

Merge request - AIPCC-3179: flashinfer-python 0.2.5 with FLASHINFER_ENABLE_SM90

Assignee:: Christian Heimes

Reporter:: Christian Heimes

Team:: Antonio's Team

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/06/27 11:57 AM

Updated:: 2025/06/30 6:53 PM

Resolved:: 2025/06/30 6:51 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty