Uploaded image for project: 'Red Hat OpenShift AI Engineering'
  1. Red Hat OpenShift AI Engineering
  2. RHOAIENG-909

[RFE] Cluster preparation for a large number of simultaneous Data Science users

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Problem Statement

      One of the typical scenario for RHODS is a university data science class where at a set time, a large number of students (RHODS users) will login to RHODS and launch notebook pods. In an autoscaling enabled environment, this will result in a sub-optimal user experience as there will be long wait times before resources are available to host the notebook pods in the queue. 

      Proposed Solution

      Provide an 'easy button' UI in RHODS that will allow a cluster admin to scale the cluster at a predefined time in preparation for the above scenario. The UI can take the input requirements for scaling the cluster in two different ways:

      Direct 

      1. Instance type and count. For eg. (m5.2xlarge, 6: p3.4xlarge: 4) 
      2. Time to trigger scaling
      3. Frequency

      Indirect 

      1. Number of expected users
      2. Expected resource requests for each user
      3. GPU type and count (V100,4 : T4, 8)

      For the indirect input option, RHODS will perform its own calculations to scale the cluster to satisfy the requirements.

      Although this functionality can be scripted by an OpenShift admin outside of RHODS, my sense is that such value add features available from within a RHODS admin dashboard will help establish RHODS as the platform of choice for collaborative data science workflows.

       

       

       

            Unassigned Unassigned
            akamra8979 Ashish Kamra
            Adriel Paredes, Jacqueline Koehler, Jeff DeMoss, Kezia Cook
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: