Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-2747

Images are not correctly updated after RHODS upgrade

XMLWordPrintable

    • False
    • False
    • No
    • 1.6.0-8
    • No
    • Hide
      Images were incorrectly updated after upgrading OpenShift Data Science:: After the process to upgrade OpenShift Data Science completed, JupyterHub failed to update its notebook images. This was due to an issue with the image caching mechanism. Images are now correctly updating after an upgrade.
      Show
      Images were incorrectly updated after upgrading OpenShift Data Science:: After the process to upgrade OpenShift Data Science completed, JupyterHub failed to update its notebook images. This was due to an issue with the image caching mechanism. Images are now correctly updating after an upgrade.
    • No
    • Yes
    • None

      Description of problem:

      During a RHODS upgrade from 1.3 to 1.5 we have noticed that some of the images are not correctly updated; Specifically we have confirmed this behaviour for Standard Data Science, PyTorch and Tensorflow, but it likely applies to all images.

      What is happening is that during an upgrade, the ImageStreams are correctly updated, i.e. the OCP console will show newer timestamps, imagestream metadata will be updated to the latest version, spawner tooltip and description will be updated. However, if an image is already cached on a worker node where a new server will be spawned after the RHODS upgrade is over, the new image will not be pulled again but rather the locally cached copy will be used.

      Because of this, there will be discrepancies between what is shown in the spawner and what is installed in the image; in this specific case, we have seen that tensorflow and tensorboard were installed at version 2.4.1 instead of 2.7/2.6 respectively, and jupyterlab/notebook were installed at the old versions for RHODS 1.3 (3.0.16/6.4.4) which are affected by a known CVE.

      Removing the locally cached image forces RHODS to pull the newer one, which fixes this issue. I believe this is due to the fact that the image pull policy is set to `IfNotPresent` and that the images in question are using the same tag even when there have been changes to the images themselves. This is a similar issue to what was observed in RHODS-2162.

      This is an example of what we are seeing, node cache:

      ImageStream data:

      $ oc get is -n redhat-ods-applications
      
      11.0.3-cuda-s2i-base-ubi8           
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-base-ubi8
                 latest              4 hours ago
      11.0.3-cuda-s2i-core-ubi8           
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-core-ubi8
                 latest              4 hours ago
      11.0.3-cuda-s2i-py38-ubi8           
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-py38-ubi8
                 latest              4 hours ago
      11.0.3-cuda-s2i-thoth-ubi8-py38     
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-thoth-ubi8-py38
           latest              4 hours ago
      minimal-gpu                         
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/minimal-gpu
                               py3.8-cuda-11.0.3   4 hours ago
      nvidia-cuda-11.0.3                  
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/nvidia-cuda-11.0.3
                        latest              5 hours ago
      pytorch                             
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/pytorch
                                   py3.8-cuda-11.0.3   4 hours ago
      s2i-generic-data-science-notebook   
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/s2i-generic-data-science-notebook
         py3.8               4 hours ago
      s2i-minimal-notebook                
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/s2i-minimal-notebook
                      py3.8               4 hours ago
      tensorflow                          
      default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/tensorflow
                                py3.8-cuda-11.0.3   4 hours ago 

      Prerequisites (if any, like setup, operators/versions):

      RHODS 1.3 with locally cached images, upgraded to RHODS 1.5

      Steps to Reproduce

      1. Install RHODS 1.3
      2. Spawn server with all images to locally cache them
      3. Upgrade RHODS to 1.5
      4. Confirm that ImageStreams have been correctly updated
      5. Try spawning a server with one of the cached images on the same worker node
      6. Verify that the spawner is not pulling the new image but using the cached copy

      Actual results:

      The servers that are being spawned still use the images provided with RHODS 1.3 rather than the updated ones for RHODS 1.5

      Expected results:

      Updated images are used

      Reproducibility (Always/Intermittent/Only Once):

      Always when upgrading from RHODS 1.3 to 1.5 with old images cached

      Build Details:

      RHODS 1.5.0-7 on OSD, OCP 4.8/4.9

      Workaround:

      Manually remove the cached images from the worker nodes

      Additional info:

              svelosol@redhat.com Samuel Veloso (Inactive)
              rhn-support-lgiorgi Luca Giorgi
              Luca Giorgi Luca Giorgi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: