-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
None
-
False
-
Release Notes
-
No
-
-
-
-
-
-
No
-
-
Known Issue
-
Done
-
No
-
Pending
-
None
We have an issue with autoscaling nodes with gpus.
If a user requests a notebook with atleast one gpu and no currently running nodes are able to accept it, a gpu node gets correctly scaled up. However by the time the nvidia dcgm exporter is up on this node, the spawned notebook is still reporting lack of gpus so the cluster autoscales several more nodes until atleast one node has an exporter running and accepts the notebook pod.
It is a known issue: https://access.redhat.com/solutions/6055181
Related RHODS issue: https://issues.redhat.com/browse/RHODS-4617