Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-144

2. Support notebook images

XMLWordPrintable

    • Support notebook images
    • False
    • False
    • No
    • To Do
    • 98
    • 98% 98%
    • Undefined
    • No

      When data science users create notebooks, they need to have some control over the notebook initial state. They would like a consistent starting point for notebooks, and they would like new notebooks to automatically have the packages they need for their data science use cases.  

      Requirements:

      1. P0: The system must provide default notebook images optimized for the latest compatible release of each of the following bundles:

      • Minimal Python
      • Standard Data Science
      • Tensorflow (will work for CPU & GPU)
      • Tensorflow GPU  
      • PyTorch (will work for CPU & GPU)
      • PyTorch GPU
      • Cuda (w/ specific version)

      2. P0: All of the notebook images in requirement 1 above except Minimal Python must include the latest compatible versions of the packages listed below. Note: notebook images will not be specifically optimized for these packages.

      • Boto3
      • Kafka-python
      • <RDS, RedShift>
      • Pandas
      • Matplotlib
      • Numpy
      • Scipy
      • Current set of dependencies in upstream ODH scipy image

      3. P0: The system must be able to upgrade all components of notebook images based on defined support versions.

      4. P0: The system must prevent users from editing or modifying the standard supported images in req 1. The rationale is we want to ensure consistent supported images with specific packages and versions. In the future, we will enable customers to create their own images. 

      5. P1: All of the notebook images in requirement 1 above except Minimal Python must include the latest compatible versions of scikit-learn. 

       

      Considerations/questions:

      • Users will keep moving forward with newer versions of components.  Need to determine how long we support the previous version if it introduces breaking changes or is a major release change.
      • Should we include other packages, such as  eg. Seaborn, sklearn?
      • Need to define specific supported releases
      • list of supported versions for each package & software versions; make available as help content
      • list of available images is fixed; 
      • need to validate list of packages vs. what is most needed; check w/ Sophie's list ; can we find out what packages users are installing; ability to request a new package

      Most popular python libraries: 1) numpy; 2) pandas; 3) matplotlib; 4) sklearn (scikit-learn); 5) os; 6) seaborn; 7) scipy

      https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned/ 

      • get metrics on what packages users are installing; 
      • might need to notify users to provide guidance on resetting NB server
      • separate epic for NB server lifecycle 
      • Need to determine timing for incorporating latest released versions

      latest version sheet here

      • Need to provide specific version for Cuda in name? See supported version sheet linked above
      • 3/29/21: We're now planning to only have 1 image each for Tensorflow and PyTorch. The images will work for both CPU & GPU. 

            jkoehler@redhat.com Jacqueline Koehler
            jdemoss@redhat.com Jeff DeMoss
            Luca Giorgi Luca Giorgi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: