Uploaded image for project: 'OpenShift Node'
  1. OpenShift Node
  2. OCPNODE-4107

Deploy dra-example-driver on OpenShift for hardware-independent DRA certification

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • Deploy dra-example-driver on OpenShift
    • In Progress
    • None
    • 80% To Do, 20% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • None
    • None
    • None

      Goal

      • Deploy the upstream kubernetes-sigs/dra-example-driver on OpenShift as a hardware-independent reference DRA driver for certifying DRA features
      • Contribute OpenShift support (SCC, UBI image, deployment docs) upstream

      Why is this important?

      • Decouples DRA feature certification from third-party drivers and hardware availability
      • The dra-example-driver is the upstream sig-node reference implementation — it simulates GPU devices via environment variables, no real hardware needed
      • Enables repeatable DRA testing in CI without GPU nodes
      • We can still test on third-party drivers (e.g., NVIDIA), but this gives a reliable baseline that is not tied to vendor release cycles

      Scenarios

      1. Engineer deploys dra-example-driver via Helm on an OCP 4.21+ cluster and verifies kubelet plugin pods are running on all nodes
      2. Engineer creates ResourceClaims, DeviceClasses, and validates device allocation/scheduling using mock GPU devices
      3. CI job deploys the driver and runs DRA e2e tests on every PR/nightly without requiring GPU hardware

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents
      • dra-example-driver deploys and runs on OCP 4.21+ (K8s 1.35) with DRA enabled by default
      • Helm chart includes OpenShift SCC for the kubelet plugin DaemonSet
      • Container image built on UBI base image
      • Core DRA workflows validated: ResourceClaim creation, device allocation, pod scheduling
      • Changes contributed upstream to kubernetes-sigs/dra-example-driver

      Dependencies (internal and external)

      1. DRA enabled by default in OCP 4.21 (OCPNODE-3895 — already merged)
      2. kubernetes-sigs/dra-example-driver upstream repo: https://github.com/kubernetes-sigs/dra-example-driver
      3. cert-manager (optional, only if webhook validation is needed)

      Previous Work

      1. OCPNODE-4079: Implement partitionable devices support in dra-example-driver
      2. OCPNODE-4043: e2e tests to validate downstream DRA APIs with NVIDIA GPU
      3. https://github.com/openshift/api/pull/2498 — DRA feature gate enabled by default

      Open questions

      1. Should we create a new OCPNODE component for dra-example-driver or reuse an existing one?
      2. Webhook: use cert-manager or OpenShift service-serving certificates?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement - <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to PR on kubernetes-sigs/dra-example-driver>
      • DEV - Upstream documentation merged: <link>
      • QE - Test plans in Polarion: <link>
      • QE - Automated tests merged: <link>
      • DOC - Downstream documentation merged: <link>

              Unassigned Unassigned
              harpatil@redhat.com Harshal Patil
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: