-
Feature
-
Resolution: Done
-
Major
-
None
-
None
-
Not Selected
-
False
-
False
-
-
M
-
5
-
0
-
7
-
2.333
-
2
Feature Overview
This feature covers the creation of a small-scale, validated Red Hat OpenStack Platform (RHOSP) 18 cluster. This cluster will be specifically designed to simulate a production environment for the purpose of testing and validating hardware accelerators (like GPUs) for AI/ML workloads. The initial deployment will be in the BRQ2 location and will focus on validating the end-to-end customer experience.
Goals
- Deploy a functional, stable, small-scale RHOSP 18 cluster in the BRQ2 lab.
- Establish a repeatable, automated process for deploying and managing the cluster to ensure consistency.
- Create environment suitable for testing and benchmarking various AI accelerator hardware.
- Enable engineering teams to effectively simulate and validate customer AI workload scenarios.
Requirements
Hardware Provisioning:
- Identify and secure sufficient physical servers in the BRQ2 lab to act as control, compute, and storage nodes.
- Ensure the selected hardware meets the minimum requirements for RHOSP 18 and the specific AI accelerators to be tested.
Network Configuration:
- Design and implement the necessary network infrastructure, including VLANs, subnets, and routing.
- The network must support all required RHOSP traffic types (Control, Internal API, Storage, Tenant, External).
Deployment Automation:
- Develop robust automation (e.g., using Ansible) for the bare-metal provisioning and deployment of the complete RHOSP 18 cluster.
- The automation must be idempotent, configurable, and version-controlled.
Acceptance Criteria
[ ] A RHOSP 18 cluster is successfully deployed and fully operational in the BRQ2 location.
[ ] All core OpenStack services (Keystone, Nova, Neutron, Cinder, Glance) are healthy, accessible via API/CLI, and pass health checks.
[ ] The entire deployment process is automated. The cluster can be torn down and redeployed from scratch using the automation with minimal manual intervention.
[ ] At least one model of AI accelerator can be successfully provisioned to a Nova instance (e.g., via PCI-passthrough) and is usable from within the guest OS.