Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-5156

Create training hub test collections

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • Development Platform
    • None
    • Create training hub test collections
    • True
    • Hide

      Blocked by flash-attn pin, AIPCC-5519

      Show
      Blocked by flash-attn pin, AIPCC-5519
    • False
    • Done
    • AIPCC-4781 - Training Hub Standalone Library/ Python Package
    • 0% To Do, 0% In Progress, 100% Done

      Feature Overview:

      • Deliver a modular training techniques library supporting SFT, preference tuning, reasoning optimization, and other advanced training methods developed by the Red Hat AI Innovation team.
      • Build a training hub python package with all dependencies. This feature aims to package all Training Hub dependencies into a Red Hat-maintained Python Package Index (PyPI). The primary purpose is to enable OpenShift AI customers to directly pull these dependencies to build their own custom container images for AI workloads. This addresses the challenge of ensuring compatibility and simplifying the user experience when customers move beyond pre-built images.

      Goals:

      • Provide standalone training capabilities
      • Support multiple training techniques developed by the Red Hat AI Innovation team
      • Enable direct PyTorch primitive access
      • Maintain technique modularity and independence

      Requirements:

      • Pip installable standalone library
      • Python SDK with technique-specific APIs
      • CLI interface for training operations
      • Support for SFT, Async GRPO, DPO, and other techniques developed by the Red Hat AI Innovation team
      • Direct PyTorch integration without high-level abstractions

       
       
      MVP 3.0:

       

      Dependencies for 3.0:

      #ALL DEPS ACROSS training-hub, instructlab-training, mini-trainer

      torch>=2.6
      typer
      deprecated
      ninja
      numba
      packaging>=20.9
      wheel>=0.43
      pyyaml
      py-cpuinfo
      transformers>=4.45.2
      datasets>=2.15.0
      numpy>=1.26.4,<2.0.0
      rich
      trl>=0.9.4
      peft
      pydantic>=2.7.0
      aiofiles>=23.2.1
      accelerate>=0.34.2

      1. OUR LIBRARIES (RELYING ON ONLY ABOVE REQUIREMENTS)
        instructlab-training>=0.11.1
        mini-trainer (NO VERSION YET)
      2. THE TRAINING HUB PACKAGE ITSELF
        training-hub (NO VERSION YET)
      3. FOR CUDA
        bitsandbytes>=0.43.1
        liger-kernel>=0.5.4
        flash-attn>=2.8.2
      4. OUR LIBRARIES WITH CUDA EXTRAS (RELYING ON ONLY THE ABOVE EXTRA REQUIREMENTS)
        instructlab-training[cuda]>=0.11.1
        mini-trainer[cuda] (NO VERSION YET)
        training-hub[cuda] (NO VERSION YET)

       

       training-hub package must install and run successfully on ARM64 (aarch64) CPUs in addition to x86_64.

       

              lmohanty@redhat.com Lalatendu Mohanty
              lmohanty@redhat.com Lalatendu Mohanty
              Antonio's Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: