Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8947

[QA][PyTorch UT][CPU, sGPU] test/test_decomp.py - TestDecompCUDA failure

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False

      Description

      Summary: Decomposition test for scaled_mm_v2 failing due to cublasLt version requirement on CPU and sGPU platforms.

      Test Class: test/test_decomp.py::TestDecompCUDA
      Number of Failing Tests: 1
      Platform: CPU, sGPU (CUDA)
      Test Type: Unit Test

      Version Information:

      • PyTorch Commit: 6bdd8c9
      • Branch: main
      • Test Date: 2026-01-14
      • Sprint: Sprint 24

      Failure Pattern:
      Single root cause - cublasLt version insufficient for DeepSeek-style scaling

      Common Error:
      code
      Exception: DeepSeek style (1x128, 128x128) scaling requires cublasLt >= 12.9
      Exception raised from _check_deepseek_support at /pytorch/aten/src/ATen/native/cuda/ScaledBlas.cpp:787
      code

      Failing Tests:
      1. test_comprehensive_torch__scaled_mm_v2_cuda_float8_e4m3fn

      Steps to Reproduce:
      code
      TEST_CONFIG=cpu python3 test/run_test.py -i test_decomp
      TEST_CONFIG=cuda python3 test/run_test.py -i test_decomp
      code

      Expected Result:
      Test should pass or be skipped if cublasLt version is insufficient

      Actual Result:
      Test fails with exception indicating cublasLt >= 12.9 is required

      Root Cause Analysis:
      Same root cause as TestCommonCUDA - requires cublasLt version 12.9 or higher for DeepSeek-style block-wise scaling support.

      Potential Solutions:
      1. Upgrade CUDA toolkit to version that includes cublasLt 12.9+
      2. Add version check to skip test when cublasLt < 12.9
      3. Mark test as expected failure on environments with older cublasLt

      Priority: P2

              Unassigned Unassigned
              pytorch-engineering PyTorch Engineering
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: