Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-10968

RHEL AI e2e validation (Admin side)

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • None
    • rhos-18.0.4
    • None
    • None
    • RHEL AI e2e validation (Admin side) - OSP configuration
    • M
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • Proposed
    • Proposed
    • To Do
    • RHOSSTRAT-135 - RHEL AI support Red Hat OpenStack and NVIDIA GPU in passthrough
    • Proposed
    • ?
    • 0% To Do, 0% In Progress, 100% Done

      After the basic validation of OSPRH-10948 we'll do a proper end to end validation of RHEL AI on RHOSO in a deployment that is closer to what a customer would have.

      The user in this case is a System Administrator, so we'll focus in what is needed to deploy and configure RHOSO to run RHEL AI as well as the validation steps necessary to confirm that the deployment is working as expected.

      The control plane doesn't affect the testing, so it can be a single node, a 3 masters, or a 3 masters and 3 workers OCP cluster (virtualized or not).

      The RHOSO networking is not under test here, so the deployment can use isolated networks or not.

      The important part will be the edpm node, which must be a baremetal node with at least one NVIDIA card. A second edpm node without GPU card needs to be used (virtual or baremetal) to confirm instance creation scheduling is working as expected.

      There is no current support in the RHEL AI image for VGPUS, so the process will focus on PCI passthrough.

      Note: This epid is only for Nova using instances, not running RHOSO on baremetal using ironic.

      Definition of Done:
      This epic will be considered as completed once we have documented the following:

      • Configuring RHOSO. This can be worked as an addendum to the existing deployment documenation, so it would indicate at what stages of the usual configuration things need to be added or changed in order to support RHEL AI instances.
        The configuration must include the steps for pre-provisioned edpm nodes as well as nodes provisioned by nova.
      • Validating the deployment by running a RHEL AI instance and confirming that the scheduling is properly happening on the edpm with the GPU card, that PCI passthrough is happening correctly, nvidia drivers are recognizing it and that RHEL AI is making use of the GPU.

      Note: Bear in mind that a part of the validation documentation overlaps with the user's epic, so the work of creating the glance image and running an instance should not be written twice.

      •  

              lmartins@redhat.com Lucas Martins
              geguileo@redhat.com Gorka Eguileor
              rhos-workloads-lightspeed
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: