-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
-
- 🚀 The feature, motivation and pitch
-
-
-
- Feature
Teach TorchInductor to generate native ldexp code for CUDA and CPU backends instead of relying on the decomposition x * pow(2, n).
- Feature
-
-
-
- Motivation
-
Currently, `torch.ldexp` is decomposed into:
```python
x * torch.pow(2.0, n)
```
This decomposition is suboptimal because:
- It currently requires computing `pow(2, n)` followed by a multiplication, while native `ldexp` implementations are significantly faster as they operate directly on the floating-point exponent bits
-
-
- Proposed Solution
-
Add Inductor codegen support for `aten.ldexp.Tensor` to emit native calls:
- *CUDA*: Use `__nv_ldexp` from libdevice
- *CPU*: Use `std::ldexp` from `<cmath>`
-
-
- Implementation Notes
-
- libdevice provides `_nv_ldexp(double, int)` and `_nv_ldexpf(float, int)`
- Similar patterns exist for other math ops in `torch/_inductor/codegen/`
- May need to handle the non-integer exponent input type differently
-
-
- cc
@ngimel
- cc
-
cc @chauhang @penguinwu