Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1590

HCP Capacity Blocks Support for GPU Reservations

XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 100% To Do, 0% In Progress, 0% Done
    • 7
    • 0

      Feature Overview:

      Enable support for AWS Capacity Blocks in HCP (NodePool API)  to allow users to guarantee access to reserved EC2 instances, particularly for critical workloads requiring specialized hardware like GPUs.

      Goals:

      • Users can create and manage Capacity Blocks through HCP (NodePool API).
      • Users can associate Machine Pools with Capacity Blocks to guarantee resource availability.

      Requirements:

      • NodePool API Extension: Implement necessary API changes to expose Capacity Blocks in NodePool.
        • Ability to create, view, and delete Capacity Blocks.
        • Ability to associate Machine Pools with Capacity Blocks.
        • Ability to specify instance type, quantity, and duration for Capacity Blocks.
      • User Documentation: Provide documentation on how to use Capacity Blocks within HCP.

      Deployment Considerations:

      • Self-managed, managed, or both: both 
      • Classic (standalone cluster): N/A
      • Hosted control planes: Yes (HCP)
      • Multi node, Compact (three node), or Single node (SNO), or all: N/A
      • Connected / Restricted Network: Both
      • Architectures: x86_64 (initially)
      • Operator compatibility: N/A
      • Backport needed: N/A
      • UI need: OpenShift Console and/or OCM
      • Other:  Consider integration with cost management tools.

      Use Cases:

      • A data science team needs to ensure access to a specific number of GPU instances for training their machine learning models. They can use Capacity Blocks to reserve these instances, guaranteeing resource availability for their critical workload.
      • A company running a high-performance computing cluster wants to guarantee access to specialized instances for their simulations. Capacity Blocks can provide this assurance, preventing resource contention and ensuring consistent performance.

      Background:

      The increasing demand for specialized hardware, especially GPUs, has made it challenging for users to guarantee access to these resources. AWS Capacity Blocks provide a solution to this problem by allowing users to reserve instances for a specific duration. This feature aims to integrate Capacity Block support into HCP to provide users with this capability.

      Customer Considerations:

      This feature will primarily benefit customers running critical workloads that require guaranteed access to specific instance types, such as those using GPUs for AI/ML or high-performance computing.

      Documentation Considerations:

      Documentation should be updated to guide users through creating, managing, and utilizing Capacity Blocks within HCP. This should include information on best practices and limitations.

      Interoperability Considerations:

      This feature will primarily impact HCP and its integration with AWS. Potential interoperability concerns with other projects like ROSA/OSD/ARO (for similar concepts if they exist) should be investigated.

              azaalouk Adel Zaalouk
              azaalouk Adel Zaalouk
              Subin M
              Yu Li Yu Li
              Laura Hinson Laura Hinson
              Alberto Garcia Lamela Alberto Garcia Lamela
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: