Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-5609

Memory leak when using mark_dirty in Python custom autograd.Function

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • None
    • PyTorch Sprint 15, PyTorch Sprint 16

          1. 🐛 Describe the bug

      I'm observing a memory leak when using a custom torch.autograd.Function with an in-place operation marked by ctx.mark_dirty(...). The following minimal repro highlights the issue:

      ```python
      import torch

      class ReluOps(torch.autograd.Function):
      @staticmethod
      def forward(ctx, x):
      x.relu_()
      ctx.mark_dirty
      ctx.save_for_backward
      return x

      @staticmethod
      def backward(ctx, grad):
      x = ctx.saved_tensors
      return (grad * x[0])

      def run():
      for i in range(100):
      x = torch.rand((100, 100, 1000), requires_grad=True)
      z = x + 1.0
      z_view = z[0]
      ReluOps.apply(z_view)

      if _name_ == "_main_":
      run()

      ```

      Memory usage increases over iterations, and is not released between invocations.

      The cycle looks like this:
      CopySlices -> PyNode -> THPFunction -> SavedVariable -> AsStridedBackward -> CopySlices

          1. Versions
            torch master
            run above code

      cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @albanD @gqchen @nikitaved @soulitzer @Varal7 @xmfan @chauhang @penguinwu

              rh-ee-visgoyal Vishal Goyal
              rh-ee-visgoyal Vishal Goyal
              PyTorch Core
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: