Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10113

Research Kubernetes-native queueing and scheduling solutions to expand the evaluation landscape beyond Kueue

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Model Validation
    • AIPCC Accelerators 26

      Goal{}

      Expand the evaluation horizon beyond the currently defined list by researching additional Kubernetes-native queueing and scheduling solutions relevant to GPUaaS.

       

      Description{}

      This story aims to broaden the GPUaaS research landscape beyond the solutions already defined in the epic, using Kubernetes Kueue as the starting baseline.

       

      The work begins with a focused review of Kubernetes Kueue to establish a shared understanding of its design, scope, and limitations.

      Based on that understanding, the research then expands outward to identify additional Kubernetes-native solutions in the ecosystem that address similar problems.

       

      The objective is explicitly to expand the horizon at the research level, ensuring we are not constrained by the initial shortlist and that no relevant Kubernetes-based approach is overlooked.

       

      The research should focus on solutions that:

      • Run on top of Kubernetes
      • Address queueing, admission control, or scheduling concerns
      • Are relevant to GPU or other scarce, high-cost resources
      • Could potentially serve GPUaaS-style requirements

       

      This story is about exploration and awareness, not commitment or implementation.

       

      Scope includes{}

      • Studying Kubernetes Kueue as the baseline reference
      • Surveying the Kubernetes ecosystem for comparable or complementary solutions
      • Identifying design patterns and architectural approaches used in the industry
      • Highlighting solutions that may warrant deeper evaluation later

       

      DoD{}

      A research summary document exists that:

      • Describes Kubernetes Kueue as the baseline
      • Lists additional Kubernetes-native solutions identified during research
      • Explains how each solution expands or differs from the current evaluation set
      • Clearly documents why each solution is relevant or potentially relevant to GPUaaS (or not)

       

      The document is shared with the team and informs whether additional technologies should be added to the evaluation phase.

              rh-ee-vshaw Vikash Shaw
              rh-ee-abadli Aviran Badli
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: