Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3190

GPU selection drop down shows 2 GPUs available with two 1 GPU nodes attached

XMLWordPrintable

    • MODH Sprint 1.9

      Description of problem:

      Attaching 2 GPU nodes with 1 GPU each to a cluster results in the JH Spawner showing 2 GPUs available for use.

      Trying to spawn with 2 GPUs will result in an error message about not enough resources available, and the user will be stuck until the 10 minute timeout is reached.

      Prerequisites (if any, like setup, operators/versions):

      RHODS 1.7.0-5 on OSD, running OCP 4.9, GPU operator v. 1.8.3

      Steps to Reproduce

      1. Provision 2 GPU nodes on cluster with 1 GPU each
      2. Wait for GPU operator to discover them
      3. Visit JH Spawner
      4. Check number of available GPUs
      5. Try spawning with maximum number of GPUs available

      Actual results:

      JH Spawner reports 2 GPUs available (the sum of GPUs in each node).
      Trying to spawn a server while requesting both GPUs fails and the user is stuck until 10 minutes pass

      Expected results:

      JH Spawner should report 1 GPU available (maximum number of GPUs in any node)
      User should not be blocked for 10 minutes

      Reproducibility (Always/Intermittent/Only Once):

      always

      Build Details:

      Workaround:

      none

      Additional info:

        1. GPU-0.png
          GPU-0.png
          6 kB
        2. GPU-1.png
          GPU-1.png
          8 kB
        3. GPU-node2.png
          GPU-node2.png
          47 kB
        4. GPUs-bugfix.png
          GPUs-bugfix.png
          524 kB

              vhire Vaishnavi Hire
              rhn-support-lgiorgi Luca Giorgi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: