Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8515

[PyTorch][Upstream CI] Fix CUTLASS Backend Test Failures on RHEL H200

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False

      Problem

      All 189 tests in inductor/test_cutlass_backend.py fail on RHEL 9.6 with H200 GPU.

      Root Cause

      CUTLASS library (NVIDIA CUDA Templates for Linear Algebra Subroutines) is not properly installed or configured in the RHEL build environment.

      Impact

      • Tests failing: ~189
      • Severity: Low - CUTLASS is an optional optimization library
      • Production impact: None - Core PyTorch functionality works without CUTLASS
      • Pass rate impact: Accounts for 88% of all failures (189/215)

      Current Workaround

      Tests excluded in workflow configuration

      Acceptance Criteria

      • [ ] CUTLASS library installed in RHEL Docker image
      • [ ] PyTorch configured to find CUTLASS installation
      • [ ] All 189 CUTLASS backend tests pass on RHEL H200
      • [ ] Tests remain excluded until fix is verified

      References

              rh-ee-sugeorge Subin George
              rh-ee-sugeorge Subin George
              PyTorch Infrastructure
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: