Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
- pytorch_qa

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description

Summary: Decomposition test for scaled_mm_v2 failing due to cublasLt version requirement on CPU and sGPU platforms.

Test Class: test/test_decomp.py::TestDecompCUDA
Number of Failing Tests: 1
Platform: CPU, sGPU (CUDA)
Test Type: Unit Test

Version Information:

PyTorch Commit: 6bdd8c9
Branch: main
Test Date: 2026-01-14
Sprint: Sprint 24

Failure Pattern:
Single root cause - cublasLt version insufficient for DeepSeek-style scaling

Common Error:
code
Exception: DeepSeek style (1x128, 128x128) scaling requires cublasLt >= 12.9
Exception raised from _check_deepseek_support at /pytorch/aten/src/ATen/native/cuda/ScaledBlas.cpp:787
code

Failing Tests:
1. test_comprehensive_torch__scaled_mm_v2_cuda_float8_e4m3fn

Steps to Reproduce:
code
TEST_CONFIG=cpu python3 test/run_test.py -i test_decomp
TEST_CONFIG=cuda python3 test/run_test.py -i test_decomp
code

Expected Result:
Test should pass or be skipped if cublasLt version is insufficient

Actual Result:
Test fails with exception indicating cublasLt >= 12.9 is required

Root Cause Analysis:
Same root cause as TestCommonCUDA - requires cublasLt version 12.9 or higher for DeepSeek-style block-wise scaling support.

Potential Solutions:
1. Upgrade CUDA toolkit to version that includes cublasLt 12.9+
2. Add version check to skip test when cublasLt < 12.9
3. Mark test as expected failure on environments with older cublasLt

Priority: P2

Assignee:: Unassigned

Reporter:: PyTorch Engineering

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/20 8:23 AM

Updated:: 2026/01/22 5:23 AM

Details

Description

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty