Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: RHODS_1.10.0_GA
Affects Version/s: None
Component/s: Documentation
Labels:
None

Story Points:
5
Blocked:
False
Blocked Reason:
None
Ready:
False
Automated:
No
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Regression:
No
Target Release:

RHODS_1.10.0_GA
Test Blocker:
No
Test Coverage:

N/A
Watchlist Impact:
None
Git Pull Request:
https://gitlab.cee.redhat.com/documentation-red-hat-openshift-data-science-documentation/openshift-data-science-documentation/-/merge_requests/471

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Based on RHODS-3074:

Initial scope for ~~RHODS-2387~~ indicated that taints should be added to individual nodes that had GPUs. However, in testing this, Luca found that node taints could not be changed after node creation (RHODS-3184) [possibly only in the default machine pool], so we switched to the current method of adding GPU nodes in a separate machine pool that had the nvidia.com/gpu taint applied.

We do not currently have instructions for users that have GPU and CPU nodes in the same pool.

According to RHODS-3074 comments, there should be two options for adding GPUs to your cluster:

1. Enabling GPU support for nodes in a new machine pool (Recommended, current process in ~~RHODS-3235~~)

appropriate when not all nodes have GPUs, and you want to schedule GPU workloads to GPU nodes, and CPU workloads to CPU nodes
add nvidia.com/gpu NoSchedule taint to the whole GPU machine pool

-2. Enabling GPU support for nodes in an existing machine pool

appropriate when all nodes have GPUs, and must be shared between both GPU and CPU workloads
no taints applied because all nodes have GPUs, and applying a taint would block CPU-only workloads-
Not applicable based on discussion with Landon, Erwan, Luca.

Some additional troubleshooting information in case of typos when setting the taint may also be useful

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2022-03-24-08-24-41-818.png
62 kB
2022/03/24 12:24 PM

mentioned on

Merge request - Merge branch 'RHODS-3390-tainting-options-for-GPU-nodes' into 'stage-1.10'

Merge request - RHODS-3390

Assignee:: Chris Tyler

Reporter:: Laura Bailey

QA Contact:: Luca Giorgi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/03/22 3:37 AM

Updated:: 2022/05/10 2:54 PM

Resolved:: 2022/04/28 5:40 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates