Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
None

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
AIPCC-3716
External issue URL:
https://github.com/pytorch/pytorch/issues/85791
Intelligence Requested:
Market:

Sprint:
PyTorch Sprint 18, PyTorch Sprint 19, PyTorch Sprint 20, PyTorch Sprint 21, PyTorch Sprint 22, PyTorch Sprint 23, PyTorch Sprint 24

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

1. 1. 🐛 Describe the bug

Using nn.CrossEntropyLoss with FP16 and long sequence is stable. However, introducing minibatch dimension would led to overflow and `CrossEntropyLoss` would output `inf`.

To reproduce:

```Python
import torch

ce = torch.nn.CrossEntropyLoss().cuda().half()

inp = torch.rand((20, 14749, 1025))
inp = inp.cuda().half()
t = torch.randint(low=0, high=14749, size=[20, 1025]).cuda()

loss = ce(inp, t)
print(loss)

ce = torch.nn.CrossEntropyLoss().cuda()
inp = torch.rand((20, 14749, 1025))
inp = inp.cuda()

loss = ce(inp, t)
print(loss)

ce.half()
inp = inp.cuda().half()
inp = inp.transpose(1,2)
inp = inp.flatten(start_dim=0, end_dim=1)
t = t.flatten(start_dim=0, end_dim=1)

loss = ce(inp, t)
print(loss)
```

The first loss would be `inf`. Both the second and the third would be correct.

1. 1. Versions

I tested on 1.8.2 and 1.12.1, both are the same.

cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are

clones

AIPCC-6062 torch.nn.functional.softshrink throws overflow error on CUDA but not on CPU

Closed

is cloned by

AIPCC-6398 `input_size` argument of `nn.RNN()` gets indirect error messages

Closed

mentioned on

Merge request - AIPCC-6062: Fix softshrink overflow when lambda exceeds dtype max

Merge request - AIPCC-6334: Add FP32 upcasting for half-precision cross-entropy on large tensors

Assignee:: Vishal Goyal

Reporter:: Vishal Goyal

Team:: PyTorch Core

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/10/23 6:39 PM

Updated:: 2026/01/12 5:06 PM

Target start:: 2025/07/31

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty