Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
- sprint_21
- triaged

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
PyTorch Sprint 21, PyTorch Sprint 22, PyTorch Sprint 23

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

test_nn tests are failing because of numerical mismatch - tensor values are not close enough on main branch

Tests Failing:

test_partial_flat_weights

Env details:

PyTorch version: 2.10.0

Branch: main

OS: RHEL 9.6

CPU: Intel

python version: 3.12

commit id : 6de6685797cabc6256df76803f3a5f772d5275a7 (tag: trunk/6de6685797cabc6256df76803f3a5f772d5275a7, origin/main, origin/HEAD)

Steps to repro:

Pull base image: podman pull quay.io/aipcc/pytorch:rhel_cuda_build_without_pins

Run the image and specify the GPU to be used: podman run -it <IMAGE_NAME>

Run the PyTorch UT: TEST_CONFIG=cpu python3 test/run_test.py -i test_nn

Expected result: UTs should run fine.

Actual result: Numerical mismatch - tensor values are not close enough (9 out of 36 elements differ, greatest absolute difference: 3.0120834708213806e-05, greatest relative difference: 0.0030780492816120386, exceeding tolerance of 1e-05 absolute and 1.3e-06 relative)

Logs are attached below

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

test_nn.log
2025/12/11 11:08 AM
1.17 MB
Nayan Bhushan Kanganahalli Nagabhushana

Assignee:: Nayan Bhushan Kanganahalli Nagabhushana

Reporter:: Nayan Bhushan Kanganahalli Nagabhushana

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/12/11 11:06 AM

Updated:: 2026/01/12 9:44 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty