Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1056

Support NVIDIA GPU MicroShift ostree based

XMLWordPrintable

    • nvidia gpu with ostree
    • Not Selected
    • False
    • False
    • None

      Epic Goal

      • We have a fully supported way to use NVIDIA GPUs on MicroShift, but for rpm{} based deployments. We expect the majority of edge deployments to be ostree based (aka: RHEL/Edge).
      • The goal of this epic is to provide a fully supported way on how to embed the necessary drivers, modules, device plugins etc. into an ostree based image. 
      • The focus is for X86 first, but we should keep in mind that the same approach should work on ARM in the long term.

      Why is this important?

      • Machine Vision at the Edge is a key use case. We have several customers on the MicroShift early access program implementing this

      Scenarios

      1. An admin follows the provided instructions (e.g. adding the necessary packages, manifests, containers etc. to the image builder blueprint). The resulting image can be installed on an edge device with an NVIDIA GPU, and the gpu workload works without any further interventions.

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      https://docs.nvidia.com/datacenter/cloud-native/edge/nvidia-gpu-with-device-edge.html#installing-the-nvidia-container-toolkit

       

      Open questions::

      1. There are some manual post-installation steps in the RPM instructions (e.g. disabling noveau). How do integrate these into ostree based? Post-Installation kickstart might not be a good idea, as not all installation methods might use kickstart (e.g. simplified installer). We might need to create a new package (e.g. microshift-nividia-gpu) which adds the necessary systemd scripting.

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              rhn-support-dfeddema Diane Feddema
              dfroehli42rh Daniel Fröhlich
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: