Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8520

[PyTorch][Upstream CI] Fix SDPA CPU Kernel Numerical Precision Issues

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False

      Problem

      Two scaled dot product attention (SDPA) CPU kernel tests fail on RHEL due to minor numerical precision differences.

      Root Cause

      The CPU-based fused attention kernel produces slightly different numerical results due to floating-point arithmetic differences and GCC 11 compiler optimizations.

      Impact

      • Tests failing: 2
      • Severity: Very Low - Numerical precision issue, not functional
      • Production impact: None - Attention mechanism works correctly
      • Mismatched elements: 1 / 1,632 (0.1%)

      Proposed Solutions

      Option 1: Relax test tolerance for RHEL builds
      Option 2: Exclude specific test variants
      Option 3: Report to PyTorch upstream for investigation

      Acceptance Criteria

      • [ ] Root cause fully understood
      • [ ] Fix implemented (tolerance relaxation or exclusion)
      • [ ] Tests pass consistently on RHEL
      • [ ] No regression in actual SDPA functionality

      References

      • Analysis: TEST_FAILURE_REPORT.md
      • Test file: test/test_transformers.py

              rh-ee-sugeorge Subin George
              rh-ee-sugeorge Subin George
              PyTorch Infrastructure
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: