-
Story
-
Resolution: Done
-
Major
-
None
-
None
Based on RHODS-3074:
Initial scope for RHODS-2387 indicated that taints should be added to individual nodes that had GPUs. However, in testing this, Luca found that node taints could not be changed after node creation (RHODS-3184) [possibly only in the default machine pool], so we switched to the current method of adding GPU nodes in a separate machine pool that had the nvidia.com/gpu taint applied.
We do not currently have instructions for users that have GPU and CPU nodes in the same pool.
According to RHODS-3074 comments, there should be two options for adding GPUs to your cluster:
1. Enabling GPU support for nodes in a new machine pool (Recommended, current process in RHODS-3235)
- appropriate when not all nodes have GPUs, and you want to schedule GPU workloads to GPU nodes, and CPU workloads to CPU nodes
- add nvidia.com/gpu NoSchedule taint to the whole GPU machine pool
-2. Enabling GPU support for nodes in an existing machine pool
- appropriate when all nodes have GPUs, and must be shared between both GPU and CPU workloads
- no taints applied because all nodes have GPUs, and applying a taint would block CPU-only workloads-
Not applicable based on discussion with Landon, Erwan, Luca.
Some additional troubleshooting information in case of typos when setting the taint may also be useful