Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3102

Add liger kernels to training loop

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      Feature Overview (mandatory - Complete while in New status)

      Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduce memory usage by 60%.

      IBM Cloud is looking for improvements to training time overhead in their SaaS. Liger kernels (https://github.com/linkedin/Liger-Kernel) have been shown to improve performance of training workloads by 20-30% as a result of significant memory reduction and GPU throughput improvement. 

      Goals
      RHEL AI users see improved training times out of the box as a result of Liger kernels

      Requirements:

      • Create an option to enable / disable the use of Liger kernels
      • Provided experimental tests show improvement and there are no side effects, set the default of that option to true

      Done - Acceptance Criteria:

      An option for enabling Liger kernels is present in the ilab config file 

       

      Use Cases - i.e. User Experience & Workflow:

      (Initial completion while in Refinement status):
      n/a

      Out of Scope _{}(Initial completion while in Refinement status):{_}
      n/a

      Documentation Considerations _{}(Initial completion while in Refinement status):{_}
      Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation..

      In the ideal scenario, this is a setting that a user would not need to worry about. However reference documentation listing it and what it does would be appropriate

       

      Questions to Answer _{}(Initial completion while in Refinement status):{_}
      What gains can we expect to see with this optimization?

      Background and Strategic Fit (Initial completion while in Refinement status):
      Improving performance and bringing down training times improves the experience for our customers.

      Customer Considerations _{}(Initial completion while in Refinement status):{_}
      n/a

              wcabanba@redhat.com William Caban
              jgreene@redhat.com Jason Greene
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: