Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
None

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
External issue URL:
https://github.com/pytorch/pytorch/issues/171624
Intelligence Requested:
Market:

Sprint:
PyTorch Sprint 23

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

1. 1. 🚀 The feature, motivation and pitch

1. 1. Feature
    Teach TorchInductor to generate native ldexp code for CUDA and CPU backends instead of relying on the decomposition x * pow(2, n).

1. 1. Motivation

Currently, `torch.ldexp` is decomposed into:
```python
x * torch.pow(2.0, n)
```

This decomposition is suboptimal because:

It currently requires computing `pow(2, n)` followed by a multiplication, while native `ldexp` implementations are significantly faster as they operate directly on the floating-point exponent bits

1. 1. Proposed Solution

Add Inductor codegen support for `aten.ldexp.Tensor` to emit native calls:

*CUDA*: Use `__nv_ldexp` from libdevice
*CPU*: Use `std::ldexp` from `<cmath>`

1. 1. Implementation Notes

libdevice provides `_nv_ldexp(double, int)` and `_nv_ldexpf(float, int)`
Similar patterns exist for other math ops in `torch/_inductor/codegen/`
May need to handle the non-integer exponent input type differently

1. 1. cc
    @ngimel

cc @chauhang @penguinwu

Assignee:: Christopher Leonard

Reporter:: Christopher Leonard

Team:: PyTorch Core

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/05 1:11 PM

Updated:: 2026/01/07 10:17 PM

Resolved:: 2026/01/07 10:17 PM