Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Major
Fix Version/s: rhos-18.0.10 FR 3
Affects Version/s: None
Component/s: internal
Labels:
None

Color Status:
Not Selected
Ready:
False
Blocked:
False
Blocked Reason:

Hide

None

Show
None

Size:
M
Impact:
5
Urgency:
0
Cost of Delay:
7
PM Score:
2.333

Business Value:
2

Feature Overview
This feature covers the creation of a small-scale, validated Red Hat OpenStack Platform (RHOSP) 18 cluster. This cluster will be specifically designed to simulate a production environment for the purpose of testing and validating hardware accelerators (like GPUs) for AI/ML workloads. The initial deployment will be in the BRQ2 location and will focus on validating the end-to-end customer experience.

Goals

Deploy a functional, stable, small-scale RHOSP 18 cluster in the BRQ2 lab.

Establish a repeatable, automated process for deploying and managing the cluster to ensure consistency.

Create environment suitable for testing and benchmarking various AI accelerator hardware.

Enable engineering teams to effectively simulate and validate customer AI workload scenarios.

Requirements
Hardware Provisioning:

Identify and secure sufficient physical servers in the BRQ2 lab to act as control, compute, and storage nodes.

Ensure the selected hardware meets the minimum requirements for RHOSP 18 and the specific AI accelerators to be tested.

Network Configuration:

Design and implement the necessary network infrastructure, including VLANs, subnets, and routing.

The network must support all required RHOSP traffic types (Control, Internal API, Storage, Tenant, External).

Deployment Automation:

Develop robust automation (e.g., using Ansible) for the bare-metal provisioning and deployment of the complete RHOSP 18 cluster.

The automation must be idempotent, configurable, and version-controlled.

Acceptance Criteria
[ ] A RHOSP 18 cluster is successfully deployed and fully operational in the BRQ2 location.

[ ] All core OpenStack services (Keystone, Nova, Neutron, Cinder, Glance) are healthy, accessible via API/CLI, and pass health checks.

[ ] The entire deployment process is automated. The cluster can be torn down and redeployed from scratch using the automation with minimal manual intervention.
[ ] At least one model of AI accelerator can be successfully provisioned to a Nova instance (e.g., via PCI-passthrough) and is usable from within the guest OS.

Assignee:: Petr Kubica

Reporter:: Lukas Svaty

Product Manager:: Lukas Svaty

Team:: rhos-workloads-lightspeed

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/04/08 11:35 AM

Updated:: 2025/08/12 1:17 PM

Resolved:: 2025/07/02 11:53 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty