Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-2364

Rebuilt Notebook images not picked up, due to image caching on nodes

XMLWordPrintable

    • False
    • False
    • No
    • 1.6.0-8
    • No
    • No
    • Yes
    • None
    • MODH Sprint 37

      Description of problem:

      If a notebook image is rebuilt, its image tag stays the same, but its sha256 signature changes. 
      This means that there can be 2 images with the same name and tag, and the only differentiation between the 2 is the sha256. (I agree that theoretically, the images SHOULD be identical). 

      Prerequisites (if any, like setup, operators/versions):

       

      Steps to Reproduce

      1. Open a tensorflow notebook. 
      2. capture the imageID (down to the sha256 sig) used by the notebook:
        oc -n rhods-notebooks get pods -o yaml | grep -E 'image\:|imageID\:'
        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow:py3.8-cuda-11.0.3      
        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow:py3.8-cuda-11.0.3   
        imageID: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow@sha256:f2c3a8a42d891135ec649c9f6633f6edca2fed6bb409a493820b20a604407178 
      3. Stop the notebook
      4. Rebuild the tensorflow image
      5. Capture the sha256 of the rebuilt image: 
        oc -n redhat-ods-applications get ImageStreamTag tensorflow:py3.8-cuda-11.0.3 -o yaml | grep dockerImageReference  
        dockerImageReference: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow@sha256:e3b267fe96d565cb69346c66ebb6add4b2dec45a698bdce9e4848593ec7f31b7 
      6. restart a tensorflow notebook, and ensure it goes to the same node as before
      7. capture the imageID (down to the sha256 sig) used by the notebook
        oc -n rhods-notebooks get pods -o yaml | grep -E 'image\:|imageID\:'
        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow:py3.8-cuda-11.0.3
        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow:py3.8-cuda-11.0.3
        imageID: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow@sha256:f2c3a8a42d891135ec649c9f6633f6edca2fed6bb409a493820b20a604407178 
      8. however, if I add a new node, and spin up a notebook again, the image gets pulled again, and this time: 
        oc -n rhods-notebooks get pods -o yaml | grep -E 'image\:|imageID\:' 
        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow:py3.8-cuda-11.0.3
        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow:py3.8-cuda-11.0.3
        imageID: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/tensorflow@sha256:e3b267fe96d565cb69346c66ebb6add4b2dec45a698bdce9e4848593ec7f31b7
         

      Actual results:

      because image f2c3a8a42 is cached on the node, any pod that goes there will keep using it, over the "newer" e3b267

       

      Expected results:

      As soon as a new image is created, whether it's different or not from the previous one, i'd expect all new notebooks to use it. 

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      RHODS 1.2.0-2

      Workaround:

      You have to manually delete the older version of the image. If you do, it does trigger the pull, and the pull does grab the latest version of the image. 

      Additional info:

       

      How often are the image tags (tensorflow:py3.8-cuda-11.0.3) updated? 
      Because I have a feeling that if we issue a new, updated version of a notebook, we will have to rev up the tag of the image so that it does properly get updated. 

       

       

            svelosol@redhat.com Samuel Veloso (Inactive)
            egranger@redhat.com Erwan Granger
            Luca Giorgi Luca Giorgi
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: