Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3069

GPU selection remains even after GPU nodes are gone

XMLWordPrintable

    • False
    • False
    • No
    • No
    • Hide
      When a user provisioned a notebook server with GPU support, and the utilized GPU nodes were subsequently removed from the cluster, the user could not create a notebook server. This occurred because the most recently used setting for the number of attached GPUs was used by default
      Show
      When a user provisioned a notebook server with GPU support, and the utilized GPU nodes were subsequently removed from the cluster, the user could not create a notebook server. This occurred because the most recently used setting for the number of attached GPUs was used by default
    • Documented as Resolved Issue
    • No
    • Yes
    • None
    • MODH Sprint 1.9, MODH Sprint 1.10

      Description of problem:

      • A non-GPU pod can no longer be spawned because the UI remembers the last choices made.

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1. working in "stage"
      2. had 1 node with GPUs
      3. successfully spawned a notebook using 1 GPU. 
      4. closed the notebook
      5. removed the GPU node from the cluster (save cost during the night)
      6. next day, want to spawn a Notebook
      7. GPU node has not been re-added yet. 
      8. So display does not propose whether to select GPUs or not. (only container size)
      9. However, spawned pod still "rememebers" last settings (gpu: 1) 
      10. Pod never runs, because GPU node is not available. 

      Actual results:

      Pod is pending until either timeout or GPU node is added back into cluster.

      Expected results:

      Pod should not request GPUs when none are available. 

       

      Reproducibility (Always/Intermittent/Only Once):

      Only seen it once, but I'm pretty sure it's always going to happen. 

       

      Build Details:

      not sure. 1.6.0 or 1.7.0

      Workaround:

      • add a GPU node. then select 0 GPU. then remove GPU node.

      Additional info:

              llasmith@redhat.com Landon LaSmith
              egranger@redhat.com Erwan Granger
              Luca Giorgi Luca Giorgi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: