-
Story
-
Resolution: Done
-
Normal
-
None
-
None
If you taint your GPU machine pool with the nvidia.com/gpu taint, the GPU stuff keeps working and your notebooks can land on [GPU nodes].
This is already the documented recommendation:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_science/1/html/m[…]ces/enabling-gpu-support-in-openshift-data-science_user-mgmt
However, the scaled-down/scale-up part of this, in the doc, is no longer required. Taints are now auto-applied and auto-removed from running machines almost immediately.
Most of the second paragraph of the introduction to this module can now be removed:
Red Hat recommends that you use a separate machine pool for GPU nodes that have the nvidia.com/gpu NoSchedule taint.
If you edit an existing machine pool to add this taint, you must first scale the machine pool down to zero nodes, and then increase the machine pool to the number of nodes that you require. This ensures that the new taint is applied to all nodes in the machine pool. To ensure consistent behavior across all nodes in the machine pool, Red Hat recommends that you increase the scale of your machine nodes promptly. As scaling nodes to zero has a disruptive effect on your deployment, Red Hat recommends that you perform this action as soon as possible, while considering your service usage patterns when selecting an appropriate time.