Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: RHODS_1.6.0_GA
Affects Version/s: RHODS_1.5.0_GA
Component/s: Workbenches
Labels:
- eng
- groomed

Blocked:
False
Ready:
False
Automated:
No
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Fixed in Build:
1.6.0-8
Regression:
No
Release Note Text:

Hide
Images were incorrectly updated after upgrading OpenShift Data Science:: After the process to upgrade OpenShift Data Science completed, JupyterHub failed to update its notebook images. This was due to an issue with the image caching mechanism. Images are now correctly updating after an upgrade.

Show
Images were incorrectly updated after upgrading OpenShift Data Science:: After the process to upgrade OpenShift Data Science completed, JupyterHub failed to update its notebook images. This was due to an issue with the image caching mechanism. Images are now correctly updating after an upgrade.
Target Release:

RHODS_1.6.0_GA
Test Blocker:
No
Test Coverage:

Yes
Watchlist Impact:
None
Market:
Test Link:
https://ODS-1070

SFDC Cases Counter:
SFDC Cases Links:

Description of problem:

During a RHODS upgrade from 1.3 to 1.5 we have noticed that some of the images are not correctly updated; Specifically we have confirmed this behaviour for Standard Data Science, PyTorch and Tensorflow, but it likely applies to all images.

What is happening is that during an upgrade, the ImageStreams are correctly updated, i.e. the OCP console will show newer timestamps, imagestream metadata will be updated to the latest version, spawner tooltip and description will be updated. However, if an image is already cached on a worker node where a new server will be spawned after the RHODS upgrade is over, the new image will not be pulled again but rather the locally cached copy will be used.

Because of this, there will be discrepancies between what is shown in the spawner and what is installed in the image; in this specific case, we have seen that tensorflow and tensorboard were installed at version 2.4.1 instead of 2.7/2.6 respectively, and jupyterlab/notebook were installed at the old versions for RHODS 1.3 (3.0.16/6.4.4) which are affected by a known CVE.

Removing the locally cached image forces RHODS to pull the newer one, which fixes this issue. I believe this is due to the fact that the image pull policy is set to `IfNotPresent` and that the images in question are using the same tag even when there have been changes to the images themselves. This is a similar issue to what was observed in RHODS-2162.

This is an example of what we are seeing, node cache:

ImageStream data:

$ oc get is -n redhat-ods-applications

11.0.3-cuda-s2i-base-ubi8           
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-base-ubi8
           latest              4 hours ago
11.0.3-cuda-s2i-core-ubi8           
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-core-ubi8
           latest              4 hours ago
11.0.3-cuda-s2i-py38-ubi8           
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-py38-ubi8
           latest              4 hours ago
11.0.3-cuda-s2i-thoth-ubi8-py38     
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/11.0.3-cuda-s2i-thoth-ubi8-py38
     latest              4 hours ago
minimal-gpu                         
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/minimal-gpu
                         py3.8-cuda-11.0.3   4 hours ago
nvidia-cuda-11.0.3                  
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/nvidia-cuda-11.0.3
                  latest              5 hours ago
pytorch                             
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/pytorch
                             py3.8-cuda-11.0.3   4 hours ago
s2i-generic-data-science-notebook   
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/s2i-generic-data-science-notebook
   py3.8               4 hours ago
s2i-minimal-notebook                
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/s2i-minimal-notebook
                py3.8               4 hours ago
tensorflow                          
default-route-openshift-image-registry.apps.ods-qe-upgr.4ujn.s1.devshift.org/redhat-ods-applications/tensorflow
                          py3.8-cuda-11.0.3   4 hours ago

Prerequisites (if any, like setup, operators/versions):

RHODS 1.3 with locally cached images, upgraded to RHODS 1.5

Steps to Reproduce

Install RHODS 1.3
Spawn server with all images to locally cache them
Upgrade RHODS to 1.5
Confirm that ImageStreams have been correctly updated
Try spawning a server with one of the cached images on the same worker node
Verify that the spawner is not pulling the new image but using the cached copy

Actual results:

The servers that are being spawned still use the images provided with RHODS 1.3 rather than the updated ones for RHODS 1.5

Expected results:

Updated images are used

Reproducibility (Always/Intermittent/Only Once):

Always when upgrading from RHODS 1.3 to 1.5 with old images cached

Build Details:

RHODS 1.5.0-7 on OSD, OCP 4.8/4.9

Workaround:

Manually remove the cached images from the worker nodes

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screenshot from 2022-01-24 17-15-51.png
44 kB
2022/01/24 5:15 PM

duplicates

RHODS-2364 Rebuilt Notebook images not picked up, due to image caching on nodes

Closed

relates to

RHODS-2748 Change Image versioning with Intel oneApi AiKit

Backlog

RHODS-1961 Image versioning issues with OpenVino

Review

mentioned on

Merge request - From RHODS-2747-RN-known-issue into master

Merge request - Merge branch 'RHODS-2747-changed-to-fixed-issues-images-not-correctly-updated-after-upgrade' into 'master'

Merge request - Merge branch... 'RHODS-2747-and-RHODS-2115-changed-to-fixed-issues-images-not-correctly-updated-after-upgrade' into stage-1-6

Merge request - Resolve RHODS-2747 "Rn known issue"

(2 mentioned on)

Assignee:: Samuel Veloso (Inactive)

Reporter:: Luca Giorgi

QA Contact:: Luca Giorgi

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2022/01/24 5:14 PM

Updated:: 2023/02/17 9:25 PM

Resolved:: 2022/02/03 12:11 PM

Details

Description

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Actual results:

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

Attachments

Attachments

Issue Links

Activity

People

Dates