Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-1990

RHODS update (1.1.1-21 -> 1.1.1-34) is incomplete

XMLWordPrintable

    • 3
    • False
    • False
    • No
    • 1.3.0-6
    • No
    • No
    • Yes
    • None
    • MODH Sprint 31, MODH Sprint 32, MODH Sprint 34, MODH Sprint 35

      Description of problem:

      The update of RHODS in my environment is incomplete at best. 

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1. Deploy RHODS 1.1.1-21
      2. Wait for version 1.1.1-34 to be available
      3. Wait for update to happen
      4. Check if the notebook images have been updated. 

      Actual results:

      Notebook images are not updated. 

      Expected results:

      All parts of RHODS should be updated.

      Reproducibility (Always/Intermittent/Only Once):

      Don't know. Not enough occasions to check. 

      Build Details:

      Going from 1.1.1-21 to 1.1.1-34. 

      Workaround:

      Additional info:

       

      The pilot cluster (https://red.ht/rhods-pilot) was deployed on September 16th.

      At that time, RHODS 1.1.1-21 was installed in the way a customer would (using the RHODS add-on in the console.redhat.com interface). 

      On October 4th, build 1.1.1-34 was promoted to production. (https://gitlab.cee.redhat.com/service/managed-tenants/-/merge_requests/1300) 

       

      This is not a cluster-specific problem. I have another cluster exhibiting the exact same behavior.  

       

       

      I can tell that some things got upgraded: 

      • The csv has changed:
      ✦ ➜ oc get csv | grep rhods
      rhods-operator.1.1.1-34                   Red Hat OpenShift Data Science                1.1.1-34          rhods-operator.1.1.1-29                   Succeeded
      

       

      • RHODS operator pod has restarted:
      ✦ ➜ oc -n redhat-ods-operator get pods
      NAME                                      READY   STATUS    RESTARTS   AGE
      cloud-resource-operator-775d69f89-zmpv5   1/1     Running   0          2d12h
      rhods-operator-744b95cc5c-7z5mv           1/1     Running   0          20h
      

       

      • And it's using a new image:
      ✦ ➜ oc -n redhat-ods-operator get pods -o yaml | grep -i '\ image\:' | sort -u
            image: quay.io/modh/cloud-resource-operator-container:1.6.0-1
            image: quay.io/modh/odh-deployer-container:v1.1.1-34
            image: quay.io/modh/odh-operator-container:v1.1.1-34
      
      • the pods in the namespace `redhat-ods-applications` also restarted, and they are using new image versions as well:
       ➜ oc -n redhat-ods-applications get pods -o yaml | grep -i '\ image\:' | sort -u
            image: quay.io/modh/jupyterhub-db-probe:v0.3
            image: quay.io/modh/odh-cmap-puller-container:v1.1.1-34
            image: quay.io/modh/odh-dashboard-container:v1.1.1-34
          - image: quay.io/modh/odh-dashboard-container:v1.1.1-34
            image: quay.io/modh/odh-deployer-container:v1.1.1-34
            image: quay.io/modh/odh-jupyterhub-container:v1.1.1-34
            image: quay.io/modh/odh-leader-election-container:v1.1.1-34
            image: quay.io/modh/odh-traefik-container:v1.1.1-34
            image: quay.io/openshift/origin-deployer:4.7.0
            image: quay.io/openshift/origin-oauth-proxy:4.7.0
            image: registry.access.redhat.com/ubi8/ubi-micro:8.4
      
      • However, none of the Image Builds have re-run:
      ✦ ➜ oc -n redhat-ods-applications get builds
      NAME                                        TYPE     FROM          STATUS     STARTED       DURATION
      11.0.3-cuda-s2i-core-ubi8-1                 Docker   Git@f390807   Complete   2 weeks ago   5m29s
      11.0.3-cuda-s2i-base-ubi8-1                 Docker   Git@f390807   Complete   2 weeks ago   4m7s
      11.0.3-cuda-s2i-py38-ubi8-1                 Docker   Git@4d85c35   Complete   2 weeks ago   4m36s
      11.0.3-cuda-s2i-thoth-ubi8-py38-1           Docker   Git@f485d7e   Complete   2 weeks ago   9m40s
      s2i-minimal-gpu-cuda-11.0.3-notebook-1      Docker   Git@8fae539   Complete   2 weeks ago   8m54s
      s2i-pytorch-gpu-cuda-11.0.3-notebook-1      Source   Git@2dd469e   Complete   2 weeks ago   11m44s
      s2i-tensorflow-gpu-cuda-11.0.3-notebook-1   Source   Git@94e8433   Complete   2 weeks ago   10m28s
      openvino-notebook-1                         Docker   Git@24a3b71   Complete   2 weeks ago   11m6s
      
      • it would also seem that the BuildConfigs for the notebook images have not been updated.
      ✦ ➜ oc -n redhat-ods-applications get buildconfigs
      NAME                                      TYPE     FROM       LATEST
      11.0.3-cuda-s2i-base-ubi8                 Docker   Git@nb-2   1
      11.0.3-cuda-s2i-core-ubi8                 Docker   Git@nb-2   1
      11.0.3-cuda-s2i-py38-ubi8                 Docker   Git@nb-1   1
      11.0.3-cuda-s2i-thoth-ubi8-py38           Docker   Git@nb-2   1
      openvino-notebook                         Docker   Git@main   1
      s2i-minimal-gpu-cuda-11.0.3-notebook      Docker   Git@nb-2   1
      s2i-pytorch-gpu-cuda-11.0.3-notebook      Source   Git@nb-2   1
      s2i-tensorflow-gpu-cuda-11.0.3-notebook   Source   Git@nb-2   1
      
      

       

      Summary:

      • It's very difficult to tell whether all pieces have been updated or not. 
      • So it's easy to think that the environment has been fully updated when in reality, it was only partially updated. 
      • This leaves the customer's environment in a state where 60% of the software is from one version, and 40% is from the other. 
        Even if there are no noticeable side effects, this is a configuration that has not been tested or vetted by Red Hat. 
      • The process for BuildConfig (using tags called NB-1, NB-2) is very opaque. 
        I was told that the tensorflow should be at NB-4, and mine is at NB-2. 
        There is no way for me to know that. If the tag was "rhods1.1.1-34", it would make it easier for me to see that something has gone wrong. 

       

        1. builds.png
          82 kB
          Erwan Granger
        2. fakeop.yaml
          2 kB
          Trevor Mckay
        3. Screenshot from 2021-10-28 12-27-08.png
          242 kB
          Pablo Felix
        4. Screenshot from 2021-11-03 09-05-56.png
          225 kB
          Pablo Felix
        5. Screenshot from 2021-11-03 09-06-34.png
          258 kB
          Pablo Felix
        6. Screenshot from 2021-11-03 09-06-46.png
          263 kB
          Pablo Felix
        7. Screenshot from 2021-11-03 09-07-12.png
          248 kB
          Pablo Felix

              tmckay@redhat.com Trevor Mckay (Inactive)
              egranger@redhat.com Erwan Granger
              Pablo Felix (Inactive), Sweta Anandpara
              Pablo Felix Pablo Felix (Inactive)
              Pablo Felix Pablo Felix (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: