Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3390

Document tainting options for GPU nodes

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • RHODS_1.10.0_GA
    • None
    • Documentation
    • None

      Based on RHODS-3074:

      Initial scope for RHODS-2387 indicated that taints should be added to individual nodes that had GPUs. However, in testing this, Luca found that node taints could not be changed after node creation (RHODS-3184) [possibly only in the default machine pool], so we switched to the current method of adding GPU nodes in a separate machine pool that had the nvidia.com/gpu taint applied.

      We do not currently have instructions for users that have GPU and CPU nodes in the same pool.

      According to RHODS-3074 comments, there should be two options for adding GPUs to your cluster:

      1. Enabling GPU support for nodes in a new machine pool (Recommended, current process in RHODS-3235)

      • appropriate when not all nodes have GPUs, and you want to schedule GPU workloads to GPU nodes, and CPU workloads to CPU nodes
      • add nvidia.com/gpu NoSchedule taint to the whole GPU machine pool

      -2. Enabling GPU support for nodes in an existing machine pool

      • appropriate when all nodes have GPUs, and must be shared between both GPU and CPU workloads
      • no taints applied because all nodes have GPUs, and applying a taint would block CPU-only workloads-
        Not applicable based on discussion with Landon, Erwan, Luca.

      Some additional troubleshooting information in case of typos when setting the taint may also be useful

              rhn-support-chtyler Chris Tyler
              rhn-ecs-lbailey Laura Bailey
              Luca Giorgi Luca Giorgi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: