Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-4617

Current implementation of GPUs in Notebook prevent auto-scaling from working for GPU nodes

XMLWordPrintable

    • False
    • None
    • False
    • Yes
    • No
    • Hide
      == The GPU drop-down was only visible if there were GPUs available
      Previously, the GPU drop-down was only visible on the notebook spawner page if GPU nodes were available.
      The GPU drop-down now also correctly displays if an autoscaling machine pool is defined in the cluster, even if no GPU nodes are currently available, possibly resulting in the provisioning of a new GPU node on the cluster.
      Show
      == The GPU drop-down was only visible if there were GPUs available Previously, the GPU drop-down was only visible on the notebook spawner page if GPU nodes were available. The GPU drop-down now also correctly displays if an autoscaling machine pool is defined in the cluster, even if no GPU nodes are currently available, possibly resulting in the provisioning of a new GPU node on the cluster.
    • Bug Fix
    • No
    • Yes
    • None
    • RHODS 1.18, RHODS 1.19

      Description of problem:

      In JupyterHub, currently, a user only sees the GPU drop-down if 

      1) there are GPU nodes running
      2) there are spare GPUs that are not used by anyone

      This means that if I have an AutoScaling GPU node pool with min=1 and max=10, I will have 1 available GPU. If user A grabs that GPU, when user B comes in, the GPU drop down will not be displayed. Therefore, user B will not be able to trigger the auto-scaling. 

      Prerequisites (if any, like setup, operators/versions):

      RHODS v 1.13 

      Steps to Reproduce

      1. Get user A to grab the only GPU available
      2. Confirm that user B is no longer shown the GPU option

      Actual results:

      GPU dropdown is only visible if there are GPUs available. 

      Expected results:

      GPU dropdown should be visible even if there are no GPUs available currently, because the action of asking for one, even if unavailable, can trigger the addition of another GPU machine. (auto-scaling). 

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      RHODS 1.13. 

      Workaround:

      Not using autoscaling and always having a lot of GPU machines, which is very costly. 

      Additional info:

        1. image-2022-11-10-16-03-37-725.png
          image-2022-11-10-16-03-37-725.png
          12 kB
        2. gpu6.png
          gpu6.png
          144 kB
        3. gpu5.png
          gpu5.png
          54 kB
        4. gpu4.png
          gpu4.png
          26 kB
        5. gpu3.png
          gpu3.png
          106 kB
        6. gpu2.png
          gpu2.png
          162 kB
        7. gpu1.png
          gpu1.png
          58 kB

              rh-ee-mroman Maros Roman (Inactive)
              egranger@redhat.com Erwan Granger
              Luca Giorgi Luca Giorgi
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: