-
Bug
-
Resolution: Done
-
Critical
-
None
-
3
-
False
-
False
-
No
-
-
-
-
-
-
-
1.3.0-6
-
No
-
No
-
Yes
-
None
-
-
MODH Sprint 31, MODH Sprint 32, MODH Sprint 34, MODH Sprint 35
Description of problem:
The update of RHODS in my environment is incomplete at best.
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
- Deploy RHODS 1.1.1-21
- Wait for version 1.1.1-34 to be available
- Wait for update to happen
- Check if the notebook images have been updated.
Actual results:
Notebook images are not updated.
Expected results:
All parts of RHODS should be updated.
Reproducibility (Always/Intermittent/Only Once):
Don't know. Not enough occasions to check.
Build Details:
Going from 1.1.1-21 to 1.1.1-34.
Workaround:
Additional info:
The pilot cluster (https://red.ht/rhods-pilot) was deployed on September 16th.
At that time, RHODS 1.1.1-21 was installed in the way a customer would (using the RHODS add-on in the console.redhat.com interface).
On October 4th, build 1.1.1-34 was promoted to production. (https://gitlab.cee.redhat.com/service/managed-tenants/-/merge_requests/1300)
This is not a cluster-specific problem. I have another cluster exhibiting the exact same behavior.
I can tell that some things got upgraded:
- The csv has changed:
✦ ➜ oc get csv | grep rhods rhods-operator.1.1.1-34 Red Hat OpenShift Data Science 1.1.1-34 rhods-operator.1.1.1-29 Succeeded
- RHODS operator pod has restarted:
✦ ➜ oc -n redhat-ods-operator get pods NAME READY STATUS RESTARTS AGE cloud-resource-operator-775d69f89-zmpv5 1/1 Running 0 2d12h rhods-operator-744b95cc5c-7z5mv 1/1 Running 0 20h
- And it's using a new image:
✦ ➜ oc -n redhat-ods-operator get pods -o yaml | grep -i '\ image\:' | sort -u image: quay.io/modh/cloud-resource-operator-container:1.6.0-1 image: quay.io/modh/odh-deployer-container:v1.1.1-34 image: quay.io/modh/odh-operator-container:v1.1.1-34
- the pods in the namespace `redhat-ods-applications` also restarted, and they are using new image versions as well:
➜ oc -n redhat-ods-applications get pods -o yaml | grep -i '\ image\:' | sort -u
image: quay.io/modh/jupyterhub-db-probe:v0.3
image: quay.io/modh/odh-cmap-puller-container:v1.1.1-34
image: quay.io/modh/odh-dashboard-container:v1.1.1-34
- image: quay.io/modh/odh-dashboard-container:v1.1.1-34
image: quay.io/modh/odh-deployer-container:v1.1.1-34
image: quay.io/modh/odh-jupyterhub-container:v1.1.1-34
image: quay.io/modh/odh-leader-election-container:v1.1.1-34
image: quay.io/modh/odh-traefik-container:v1.1.1-34
image: quay.io/openshift/origin-deployer:4.7.0
image: quay.io/openshift/origin-oauth-proxy:4.7.0
image: registry.access.redhat.com/ubi8/ubi-micro:8.4
- However, none of the Image Builds have re-run:
✦ ➜ oc -n redhat-ods-applications get builds NAME TYPE FROM STATUS STARTED DURATION 11.0.3-cuda-s2i-core-ubi8-1 Docker Git@f390807 Complete 2 weeks ago 5m29s 11.0.3-cuda-s2i-base-ubi8-1 Docker Git@f390807 Complete 2 weeks ago 4m7s 11.0.3-cuda-s2i-py38-ubi8-1 Docker Git@4d85c35 Complete 2 weeks ago 4m36s 11.0.3-cuda-s2i-thoth-ubi8-py38-1 Docker Git@f485d7e Complete 2 weeks ago 9m40s s2i-minimal-gpu-cuda-11.0.3-notebook-1 Docker Git@8fae539 Complete 2 weeks ago 8m54s s2i-pytorch-gpu-cuda-11.0.3-notebook-1 Source Git@2dd469e Complete 2 weeks ago 11m44s s2i-tensorflow-gpu-cuda-11.0.3-notebook-1 Source Git@94e8433 Complete 2 weeks ago 10m28s openvino-notebook-1 Docker Git@24a3b71 Complete 2 weeks ago 11m6s
- it would also seem that the BuildConfigs for the notebook images have not been updated.
✦ ➜ oc -n redhat-ods-applications get buildconfigs NAME TYPE FROM LATEST 11.0.3-cuda-s2i-base-ubi8 Docker Git@nb-2 1 11.0.3-cuda-s2i-core-ubi8 Docker Git@nb-2 1 11.0.3-cuda-s2i-py38-ubi8 Docker Git@nb-1 1 11.0.3-cuda-s2i-thoth-ubi8-py38 Docker Git@nb-2 1 openvino-notebook Docker Git@main 1 s2i-minimal-gpu-cuda-11.0.3-notebook Docker Git@nb-2 1 s2i-pytorch-gpu-cuda-11.0.3-notebook Source Git@nb-2 1 s2i-tensorflow-gpu-cuda-11.0.3-notebook Source Git@nb-2 1
Summary:
- It's very difficult to tell whether all pieces have been updated or not.
- So it's easy to think that the environment has been fully updated when in reality, it was only partially updated.
- This leaves the customer's environment in a state where 60% of the software is from one version, and 40% is from the other.
Even if there are no noticeable side effects, this is a configuration that has not been tested or vetted by Red Hat. - The process for BuildConfig (using tags called NB-1, NB-2) is very opaque.
I was told that the tensorflow should be at NB-4, and mine is at NB-2.
There is no way for me to know that. If the tag was "rhods1.1.1-34", it would make it easier for me to see that something has gone wrong.