-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
-
False
-
-
Description
Summary: Decomposition test for scaled_mm_v2 failing due to cublasLt version requirement on CPU and sGPU platforms.
Test Class: test/test_decomp.py::TestDecompCUDA
Number of Failing Tests: 1
Platform: CPU, sGPU (CUDA)
Test Type: Unit Test
Version Information:
- PyTorch Commit: 6bdd8c9
- Branch: main
- Test Date: 2026-01-14
- Sprint: Sprint 24
Failure Pattern:
Single root cause - cublasLt version insufficient for DeepSeek-style scaling
Common Error:
code
Exception: DeepSeek style (1x128, 128x128) scaling requires cublasLt >= 12.9
Exception raised from _check_deepseek_support at /pytorch/aten/src/ATen/native/cuda/ScaledBlas.cpp:787
code
Failing Tests:
1. test_comprehensive_torch__scaled_mm_v2_cuda_float8_e4m3fn
Steps to Reproduce:
code
TEST_CONFIG=cpu python3 test/run_test.py -i test_decomp
TEST_CONFIG=cuda python3 test/run_test.py -i test_decomp
code
Expected Result:
Test should pass or be skipped if cublasLt version is insufficient
Actual Result:
Test fails with exception indicating cublasLt >= 12.9 is required
Root Cause Analysis:
Same root cause as TestCommonCUDA - requires cublasLt version 12.9 or higher for DeepSeek-style block-wise scaling support.
Potential Solutions:
1. Upgrade CUDA toolkit to version that includes cublasLt 12.9+
2. Add version check to skip test when cublasLt < 12.9
3. Mark test as expected failure on environments with older cublasLt
Priority: P2