Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-163

4. Notebook server operations

XMLWordPrintable

    • Notebook server operations
    • False
    • False
    • No
    • To Do
    • 100
    • 100% 100%
    • Undefined
    • No

      Data Science users can perform the primary functions of data preparation and model development within Jupyter notebooks. This epic covers requirements for standard notebook capabilities and operations.

      Requirements:

      1. P0: The system must support the ability to launch JupyterLab for notebook creation and access to existing notebooks and other files within a notebook server environment. 
      2. P0: New notebook servers must include appropriate packages and libraries based on the selected notebook image. Note: the images and packages are defined in the 'Support notebook images' epic.
      3. P0: The system must support the ability to import a new notebook file from a local device.
      4. P2: The system must support the ability to import a new notebook file from a specified URL.
      5. P0: The system must support the ability to build models using tools based on the notebook image associated with the notebook server. For example, if the server is using a Tensorflow GPU image, the notebook must be able to build models using Tensorflow and utilize GPUs for compute-intensive processes.
      6. P0: Notebooks must be able to utilize environment variables defined as part of the notebook server configuration. This includes access to data (eg. in S3) and the use of services (eg. Managed Kafka or ISV services). 
      7. P1: The system must support the ability for multiple users with access to a notebook server to access the same data in S3. Note: This assume the notebook server environment is connected to a S3 account. Note: this is for access to different notebook servers accessing to same data.
      8. P0: The system must support the ability to bring in an existing model into a new notebook.
      9. P1: The system must provide detailed error messages (for users) with information on how to resolve issue (eg. insufficient memory resources - what should the Data Science user do?).

       

      Test cases:

      1. Verify that you can use libraries for auto installed packages. 
      2. Ability to query & visualize data in S3.
      3. Ability to create new datasets in S3 (eg new columns, filter columns, remove rows)
      4. Split data into training & validate.
      5. Build models using Tensorflow and Pytorch.
      6. Pytorch and Tensorflow performance with data in S3
      7. Train model in notebook server with GPUs and verify GPUs utilized in training.
      8. Test/validate model. 
      9. Multiple users with access to a notebook server can access the same data in S3 - single access credentials in environment variables should enable this.

            Unassigned Unassigned
            jdemoss@redhat.com Jeff DeMoss
            Luca Giorgi Luca Giorgi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: