Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-10724

RHEL AI support Red Hat OpenStack and NVIDIA GPU in passthrough

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • ?
    • ?
    • ?
    • ?
    • 57% To Do, 43% In Progress, 0% Done

      Feature Overview

      Enable support for RHEL AI on Red Hat OpenStack, allowing organizations to deploy, manage, and run RHEL AI workloads within their private cloud infrastructure powered by Red Hat OpenStack. This feature extends the deployment options for RHEL AI beyond bare-metal, IBM Cloud, AWS, Azure, and Google Cloud Platform, providing greater flexibility and control over AI workloads in on-premises environments.

      Goals 

      Primary User Personas:

      • Administrator persona: Gain the ability to deploy and manage RHEL AI in their Red Hat OpenStack environments with familiar tools.
      • MLOps and App Developers persona: Access RHEL AI capabilities within the organization's private cloud, facilitating collaboration and innovation.
      • As a OSP cloud user I want to try LLM on RHEL AI in a virtual machine deployed on my OSP18 cloud.
      • As a OSP cloud user I want to utilize the full potential of RHEL AI to avoid Bare Metal allocation to RHEL AI outside of my OSP18 cloud.

      Value:

      • Expanded deployment options: Users can now deploy RHEL AI on RHOSP/RHOSO, leveraging existing private cloud investments.
      • Data sovereignty and compliance: Organizations can keep sensitive data on-premises, meeting regulatory and compliance requirements.
      • Integrated management: IT operations teams can manage RHEL AI instances alongside other workloads within RHOSP/RHOSO, streamlining operations.
      • Optimized resource utilization: Efficient use of hardware resources, including NVIDIA GPU acceleration, within the RHOSP/RHOSO environment.

      Requirements

      • Compatibility:
        RHEL AI must be fully compatible with  Red Hat OpenStack Services on OpenShift 18.0. It should support essential Red Hat OpenStack services required for deployment.
      • Installation Process for Administrators:
        The administrator persona must be able to install RHEL AI using the RHEL AI ISO qcow2 file with clear, step-by-step guidance and tools. This involves providing the RHEL AI qcow2 image, which should be available for download from the Red Hat Customer Portal.
        The image must be pre-configured and optimized for Red Hat OpenStack environments (cloud-init), including necessary drivers and settings. 
        Administrators should receive detailed instructions to import the RHEL AI qcow2 image into Glance, the Red Hat OpenStack Image service. The steps must cover both CLI and Horizon Dashboard methods. 
        Instructions should include setting appropriate image properties (e.g., name, disk format, container format, visibility) and verifying image integrity using checksums.
        Guidance must be provided on creating Red Hat OpenStack flavors that allocate sufficient resources (vCPUs, RAM, disk space) for RHEL AI workloads. Instructions on configuring flavors to support GPU resources are required.
        This includes enabling GPU passthrough capabilities and NVIDIA GPUs with the flavor.
        Procedures for launching RHEL AI instances using the imported qcow2 image and created flavors should cover instance naming conventions, key pair injection for SSH access, and selection of appropriate networks and security groups.
        A set of validation tests for administrators to confirm successful installation must be provided. 
      • Hardware Acceleration:
        Ensure support for GPU passthrough in Red Hat Red Hat OpenStack. Compatibility with NVIDIA only for this Feature.
      • Documentation:
        Provide comprehensive installation guides tailored for the administrator persona. 

      Non-Functional Requirements:

      • Performance:
        Provide performance benchmarks for RHEL AI on Red Hat OpenStack.
      • Scalability:
        Support scaling out by launching RHEL AI instances for inferencing with Octavia load balancing vLLM

      Checklist:

      • Verify compatibility Red Hat OpenStack Services on OpenShift 18.0
      • Ensure GPU passthrough support are functional.
      • Update RHEL AI and Red Hat OpenStack documentation with new deployment guides.
      • Conduct security assessments and address any vulnerabilities.
      • Perform performance benchmarking and optimization.
      • Train support and field teams on the new feature.
      • Release notes and customer communication prepared.

      Use Cases - i.e. User Experience & Workflow:

      Successfully deploy RHEL AI on Red Hat OpenStack using the RHEL AI qcow2 image, enabling AI workloads within the organization's private cloud.

      Preparation:

      • Access Red Hat Customer Portal:
        • Administrator logs into the Red Hat Customer Portal using authorized credentials.
        • Navigates to the RHEL AI download section.
      • Download qcow2 Image:
        • Downloads the latest RHEL AI qcow2 image file.
        • Verifies the image integrity using provided checksums.
      • Importing Image into Red Hat OpenStack Glance:
        • Uses the openstack image create command.
        • Uploads the RHEL AI qcow2 image to Glance.
        • Sets image properties:
          • Name: "RHEL AI"
          • Disk Format: QCOW2
          • Container Format: Bare
          • Visibility: Private or Shared (as per policy)
      • Verify Image Upload:
        • Confirms that the image is successfully uploaded and available for use.
      • Configuring Red Hat OpenStack Resources:
        • Create Flavors:
          • Defines a new flavor tailored for RHEL AI instances, specifying:
            • vCPUs: Appropriate number for AI workloads.
            • RAM: Sufficient memory.
            • Disk Space: Adequate storage.
            • GPU Configuration:
              • If using GPU acceleration, includes GPU resources in the flavor.
              • Configures GPU passthrough 
        • Set Up Networking:
          • Configures necessary Neutron networks and subnets.
          • Ensures connectivity to required networks (e.g., management, storage, external).
      • Launching RHEL AI Instance:
        • Instance Creation:
          • Initiates the creation of a new instance using the RHEL AI image and the defined flavor.
          • Assigns the instance to the appropriate network(s).
          • Attaches the SSH key pair.
          • Sets instance metadata if needed.
        • Boot Process:
          • Monitors the instance status to ensure it transitions to "Active."
        • Assign Floating IP:
          • Allocates and associates a floating IP address for external access.
        • Access Instance:
          • Uses SSH to log into the RHEL AI instance with the private key.
        • Test GPU Drivers:
          • Verifies GPU recognition with a console access and the nvidia-smi command
        • Verify RHEL AI Services:
          • Configure RHEL AI and run the end-to-end workflow of synthetic data generation, training, and evaluation.

      Documentation

      New Install section for Red Hat OpenStack
      New requirements section

              egallen Erwan Gallen
              egallen Erwan Gallen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: