-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
5
-
False
-
-
False
-
-
-
PyTorch Sprint 18, PyTorch Sprint 19, PyTorch Sprint 20, PyTorch Sprint 21, PyTorch Sprint 22, PyTorch Sprint 23, PyTorch Sprint 24
-
-
- 🐛 Describe the bug
-
Using nn.CrossEntropyLoss with FP16 and long sequence is stable. However, introducing minibatch dimension would led to overflow and `CrossEntropyLoss` would output `inf`.
To reproduce:
```Python
import torch
ce = torch.nn.CrossEntropyLoss().cuda().half()
inp = torch.rand((20, 14749, 1025))
inp = inp.cuda().half()
t = torch.randint(low=0, high=14749, size=[20, 1025]).cuda()
loss = ce(inp, t)
print(loss)
ce = torch.nn.CrossEntropyLoss().cuda()
inp = torch.rand((20, 14749, 1025))
inp = inp.cuda()
loss = ce(inp, t)
print(loss)
ce.half()
inp = inp.cuda().half()
inp = inp.transpose(1,2)
inp = inp.flatten(start_dim=0, end_dim=1)
t = t.flatten(start_dim=0, end_dim=1)
loss = ce(inp, t)
print(loss)
```
The first loss would be `inf`. Both the second and the third would be correct.
-
-
- Versions
-
I tested on 1.8.2 and 1.12.1, both are the same.
cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are
- clones
-
AIPCC-6062 torch.nn.functional.softshrink throws overflow error on CUDA but not on CPU
-
- Closed
-
- is cloned by
-
AIPCC-6398 `input_size` argument of `nn.RNN()` gets indirect error messages
-
- Closed
-
- mentioned on