-
Bug
-
Resolution: Done
-
Critical
-
None
-
2
-
False
-
False
-
No
-
-
-
-
-
-
1.9.0-7
-
No
-
No
-
Yes
-
None
-
-
MODH Sprint 1.9
Description of problem:
Attaching 2 GPU nodes with 1 GPU each to a cluster results in the JH Spawner showing 2 GPUs available for use.
Trying to spawn with 2 GPUs will result in an error message about not enough resources available, and the user will be stuck until the 10 minute timeout is reached.
Prerequisites (if any, like setup, operators/versions):
RHODS 1.7.0-5 on OSD, running OCP 4.9, GPU operator v. 1.8.3
Steps to Reproduce
- Provision 2 GPU nodes on cluster with 1 GPU each
- Wait for GPU operator to discover them
- Visit JH Spawner
- Check number of available GPUs
- Try spawning with maximum number of GPUs available
Actual results:
JH Spawner reports 2 GPUs available (the sum of GPUs in each node).
Trying to spawn a server while requesting both GPUs fails and the user is stuck until 10 minutes pass
Expected results:
JH Spawner should report 1 GPU available (maximum number of GPUs in any node)
User should not be blocked for 10 minutes
Reproducibility (Always/Intermittent/Only Once):
always
Build Details:
Workaround:
none
Additional info:
- mentioned on