-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
False
-
-
False
-
-
Description
Summary:
TestForeachCUDA test class is failing during PyTorch CPU unit test execution with tensor comparison error.
Test Class: test_foreach.py::TestForeachCUDA
Number of Failing Tests: 1
Platform: CPU
Test Type: Unit Test
Version Information:
- PyTorch Commit: 6bdd8c9
- Branch: main
- Test Date: 2026-01-14
- Python Version: 3.12.11
- Sprint: Sprint 24
Failure Pattern:
Single root cause - tensor comparison failure
Common Error:
RuntimeError: Comparing
TensorOrArrayPair(
id=(),
actual=tensor([1., 1., 1., ..., 1., 1., 1.], device="cuda:0"),
expected=tensor([1., 1., 1., ..., 1., 1., 1.], device="cuda:0"),
rtol=1.3e-06,
atol=1e-05,
equal_nan=True,
check_device=False,
resulted in the unexpected exception above. If you are a user and see this message during normal operation please file an issue at https://github.com/pytorch/pytorch/issues.
Failing Tests:
1. test_foreach_copy_with_multi_dtypes_large_input_cuda
Steps to Reproduce:
1. Set up PyTorch environment with Python 3.12
2. Execute the failing test:
TEST_CONFIG=cpu python3 test/run_test.py -i test_foreach TEST_CONFIG=cuda python3 test/run_test.py -i test_foreach
3. Observe RuntimeError during tensor comparison
Expected Result:
Test should pass with foreach copy operations producing correct tensor values
Actual Result:
Test fails with RuntimeError during tensor comparison despite values appearing identical
Root Cause Analysis:
The failure indicates:
- Tensor comparison framework encountering unexpected exception
- Values appear identical but comparison fails due to internal error
- Issue with multi-dtype handling in foreach copy operations
- Possible dtype conversion or comparison logic bug
Potential Solutions:
1. Investigate tensor comparison framework for multi-dtype scenarios
2. Check foreach copy implementation for large input handling
3. Review dtype conversion logic in foreach operations
4. Verify comparison tolerances are appropriate for mixed dtypes
5. Check for edge cases in tensor comparison with equal_nan=True
Logs:
Test execution logs available in CPU test suite
Log location: /home/ktanmay/Downloads/Run 1-20260120T060019Z-1-001/Run 1/20260114_024940_commit_6bdd8c9/cpu_tests.log
Additional Context:
- TestForeachCUDA test running in CPU test suite (may be multi-device test)
- Test involves copying with multiple dtypes and large input sizes
- Related ticket
AIPCC-8265exists for similar failure on sGPU platform - Error message suggests this is an internal comparison framework issue
Severity: Medium
Priority: P3