Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8264

[QA][PyTorch UT][sGPU] test/inductor/test_torchinductor_strided_blocks.py - TritonTensorDescriptorTestCUDA failures

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

        Fix Applied - MR Raised - https://gitlab.com/redhat/rhel-ai/team-pytorch/pytorch/-/merge_requests/186

        Root cause: Test test_2d_reduction_multi_kernel_cuda crashes the process on CUDA with TMA descriptors.

        Solution: Added test to TMA_TEST_XFAIL dictionary with is_skip=True in test/inductor/test_torchinductor_strided_blocks.py.

        Branch: fix/skip-test-2d-reduction-multi-kernel-tma-crash

        Status: MR raised for review. Will push upstream to PyTorch after validation.

        Testing: Fix validated - test now skips cleanly without crashing the process.

      Show
        Fix Applied - MR Raised - https://gitlab.com/redhat/rhel-ai/team-pytorch/pytorch/-/merge_requests/186   Root cause: Test test_2d_reduction_multi_kernel_cuda crashes the process on CUDA with TMA descriptors.   Solution: Added test to TMA_TEST_XFAIL dictionary with is_skip=True in test/inductor/test_torchinductor_strided_blocks.py.   Branch: fix/skip-test-2d-reduction-multi-kernel-tma-crash   Status: MR raised for review. Will push upstream to PyTorch after validation.   Testing: Fix validated - test now skips cleanly without crashing the process.

      *Test Class:* test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA
      *Failing Tests:* 1
      *Error Pattern:* related_issues

          1. Description

      Summary:
      1 test(s) in TritonTensorDescriptorTestCUDA are failing during PyTorch unit test execution on sGPU platform.

      Test Class: test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA
      Number of Failing Tests: 1
      Platform: sGPU
      Test Type: Unit Test

      Version Information:

      • PyTorch Commit: 4816fd9
      • Test Date: 2025-12-22
      • Pipeline ID: 2217097191
      • Platform: sGPU

      Failure Pattern:
      Tests failing with 2 related error patterns - likely common root cause

      Error Patterns:
      1.

      test/inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_2d_reduction_multi_kernel_cuda

      2.

      File "/pytorch/test/test_expanded_weights.py", line 1032, in <lambda>

      Failing Tests:
      1. test_2d_reduction_multi_kernel_cuda

      Steps to Reproduce:
      1. Pull the PyTorch test image
      2. Run the failing test class:

         TEST_CONFIG=cuda python3 test/run_test.py -i inductor/test_torchinductor_strided_blocks
         

      3. Observe test failures

      Expected Result:
      All tests in TritonTensorDescriptorTestCUDA should pass

      Actual Result:
      1 test(s) failing with errors shown above

      Logs:
      Pipeline ID: 2217097191
      CI Artifacts: Available in pipeline artifacts

      Additional Context:
      Test failures identified in automated PyTorch CI run.

      Severity: Medium
      Priority: P3

              rh-ee-arsamant ARINDAM SAMANTA
              rh-ee-ktanmay Kumar Tanmay
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: