Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: RHODS_1.10.0_GA
Affects Version/s: None
Component/s: Integrations, Workbenches
Labels:
- eng
- groomed

Story Points:
3
Blocked:
False
Ready:
False
Affects:

Release Notes
Automated:
No
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Fixed in Build:
1.10.0-6
Regression:
No
Release Note Text:

Hide
Incorrect number of available GPUs were displayed in JupyterHub:: When a user attempted to create a notebook instance in JupyterHub, the maximum number of GPUs available for scheduling was not updated as GPUs were assigned. As a result, there was a delay if the user requested a GPU that was already assigned.

Show
Incorrect number of available GPUs were displayed in JupyterHub:: When a user attempted to create a notebook instance in JupyterHub, the maximum number of GPUs available for scheduling was not updated as GPUs were assigned. As a result, there was a delay if the user requested a GPU that was already assigned.
Release Note Status:
Documented as Resolved Issue
Target Release:

RHODS_1.10.0_GA
Test Blocker:
No
Test Coverage:

Yes
Watchlist Impact:
None
Git Pull Request:
https://github.com/red-hat-data-services/jupyterhub-singleuser-profiles/pull/57
Market:

Sprint:
MODH Sprint 1.9, MODH Sprint 1.10

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When GPUs are enabled and the JH spawner shows the GPU selection dropdown, the number of GPUs that can be requested does not decrease as GPUs get assigned.

If the cluster has 1 GPU available, and user1 spawns a server with 1 GPU attached, user2 will keep seeing 1 GPU available in the spawner. Furthermore, if user2 tries spawning a server while requesting 1 GPU, they will be stuck waiting for either the JH timeout (10 minutes) or for user1 to kill their server and release the GPU.

Prerequisites (if any, like setup, operators/versions):

RHODS 1.7.0-5 on OSD running OCP 4.10; GPU operator installed, at least 1 GPU node provisioned on cluster

Steps to Reproduce

log in as user 1
spawn notebook server with 1 GPU attached
log out without closing the server
log in as user 2
try to spawn notebook server requesting 1 GPU

Actual results:

user 2 can request 1 GPU, but the server will not be spawned because of lack of available resources. If the 10 minute timeout passes the spawning process will fail.

Expected results:

User 2 should not see any GPUs available if the GPU is already attached to user1's server.
When spawning the server, the user should not be stuck waiting for 10 minutes.

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

RHODS 1.7.0-5 on OCP 4.10 rc7

Workaround:

No real workaround, user 1 can unblock user 2 by killing their own server and freeing up the GPU

Additional info:

mentioned on

Merge request - Updated 4 upstream sources

Merge request - Updated 5 upstream sources

Assignee:: Vaishnavi Hire

Reporter:: Luca Giorgi

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2022/03/02 11:19 AM

Updated:: 2023/02/17 8:24 PM

Resolved:: 2022/04/21 4:51 PM

Details

Description

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Actual results:

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates