Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8336

Teach TorchInductor to generate native `ldexp` code for CUDA and CPU backends instead of relying on the decomposition `x * pow(2, n)`.

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • None
    • PyTorch Sprint 23

          1. 🚀 The feature, motivation and pitch
          1. Feature
            Teach TorchInductor to generate native ldexp code for CUDA and CPU backends instead of relying on the decomposition x * pow(2, n).
          1. Motivation

      Currently, `torch.ldexp` is decomposed into:
      ```python
      x * torch.pow(2.0, n)
      ```

      This decomposition is suboptimal because:

      • It currently requires computing `pow(2, n)` followed by a multiplication, while native `ldexp` implementations are significantly faster as they operate directly on the floating-point exponent bits
          1. Proposed Solution

      Add Inductor codegen support for `aten.ldexp.Tensor` to emit native calls:

      • *CUDA*: Use `__nv_ldexp` from libdevice
      • *CPU*: Use `std::ldexp` from `<cmath>`
          1. Implementation Notes
      • libdevice provides `_nv_ldexp(double, int)` and `_nv_ldexpf(float, int)`
      • Similar patterns exist for other math ops in `torch/_inductor/codegen/`
      • May need to handle the non-integer exponent input type differently
          1. cc
            @ngimel

      cc @chauhang @penguinwu

              rh-ee-chleonar Christopher Leonard
              rh-ee-chleonar Christopher Leonard
              PyTorch Core
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: