Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: RHODS_1.3.0_GA
Affects Version/s: None
Component/s: Install Upgrade Uninstall
Labels:
- FailedQA
- egn
- groomed

Story Points:
3
Blocked:
False
Ready:
False
Acceptance Criteria:
None
Automated:
No
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Fixed in Build:
1.3.0-6
Regression:
No
Target Release:

RHODS_1.3.0_GA
Test Blocker:
No
Test Coverage:

Yes
Watchlist Impact:
None
Market:

Sprint:
MODH Sprint 31, MODH Sprint 32, MODH Sprint 34, MODH Sprint 35

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The update of RHODS in my environment is incomplete at best.

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Deploy RHODS 1.1.1-21
Wait for version 1.1.1-34 to be available
Wait for update to happen
Check if the notebook images have been updated.

Actual results:

Notebook images are not updated.

Expected results:

All parts of RHODS should be updated.

Reproducibility (Always/Intermittent/Only Once):

Don't know. Not enough occasions to check.

Build Details:

Going from 1.1.1-21 to 1.1.1-34.

Workaround:

Additional info:

The pilot cluster (https://red.ht/rhods-pilot) was deployed on September 16th.

At that time, RHODS 1.1.1-21 was installed in the way a customer would (using the RHODS add-on in the console.redhat.com interface).

On October 4th, build 1.1.1-34 was promoted to production. (https://gitlab.cee.redhat.com/service/managed-tenants/-/merge_requests/1300)

This is not a cluster-specific problem. I have another cluster exhibiting the exact same behavior.

I can tell that some things got upgraded:

The csv has changed:

✦ ➜ oc get csv | grep rhods
rhods-operator.1.1.1-34                   Red Hat OpenShift Data Science                1.1.1-34          rhods-operator.1.1.1-29                   Succeeded

RHODS operator pod has restarted:

✦ ➜ oc -n redhat-ods-operator get pods
NAME                                      READY   STATUS    RESTARTS   AGE
cloud-resource-operator-775d69f89-zmpv5   1/1     Running   0          2d12h
rhods-operator-744b95cc5c-7z5mv           1/1     Running   0          20h

And it's using a new image:

✦ ➜ oc -n redhat-ods-operator get pods -o yaml | grep -i '\ image\:' | sort -u
      image: quay.io/modh/cloud-resource-operator-container:1.6.0-1
      image: quay.io/modh/odh-deployer-container:v1.1.1-34
      image: quay.io/modh/odh-operator-container:v1.1.1-34

the pods in the namespace `redhat-ods-applications` also restarted, and they are using new image versions as well:

 ➜ oc -n redhat-ods-applications get pods -o yaml | grep -i '\ image\:' | sort -u
      image: quay.io/modh/jupyterhub-db-probe:v0.3
      image: quay.io/modh/odh-cmap-puller-container:v1.1.1-34
      image: quay.io/modh/odh-dashboard-container:v1.1.1-34
    - image: quay.io/modh/odh-dashboard-container:v1.1.1-34
      image: quay.io/modh/odh-deployer-container:v1.1.1-34
      image: quay.io/modh/odh-jupyterhub-container:v1.1.1-34
      image: quay.io/modh/odh-leader-election-container:v1.1.1-34
      image: quay.io/modh/odh-traefik-container:v1.1.1-34
      image: quay.io/openshift/origin-deployer:4.7.0
      image: quay.io/openshift/origin-oauth-proxy:4.7.0
      image: registry.access.redhat.com/ubi8/ubi-micro:8.4

However, none of the Image Builds have re-run:

✦ ➜ oc -n redhat-ods-applications get builds
NAME                                        TYPE     FROM          STATUS     STARTED       DURATION
11.0.3-cuda-s2i-core-ubi8-1                 Docker   Git@f390807   Complete   2 weeks ago   5m29s
11.0.3-cuda-s2i-base-ubi8-1                 Docker   Git@f390807   Complete   2 weeks ago   4m7s
11.0.3-cuda-s2i-py38-ubi8-1                 Docker   Git@4d85c35   Complete   2 weeks ago   4m36s
11.0.3-cuda-s2i-thoth-ubi8-py38-1           Docker   Git@f485d7e   Complete   2 weeks ago   9m40s
s2i-minimal-gpu-cuda-11.0.3-notebook-1      Docker   Git@8fae539   Complete   2 weeks ago   8m54s
s2i-pytorch-gpu-cuda-11.0.3-notebook-1      Source   Git@2dd469e   Complete   2 weeks ago   11m44s
s2i-tensorflow-gpu-cuda-11.0.3-notebook-1   Source   Git@94e8433   Complete   2 weeks ago   10m28s
openvino-notebook-1                         Docker   Git@24a3b71   Complete   2 weeks ago   11m6s

it would also seem that the BuildConfigs for the notebook images have not been updated.

✦ ➜ oc -n redhat-ods-applications get buildconfigs
NAME                                      TYPE     FROM       LATEST
11.0.3-cuda-s2i-base-ubi8                 Docker   Git@nb-2   1
11.0.3-cuda-s2i-core-ubi8                 Docker   Git@nb-2   1
11.0.3-cuda-s2i-py38-ubi8                 Docker   Git@nb-1   1
11.0.3-cuda-s2i-thoth-ubi8-py38           Docker   Git@nb-2   1
openvino-notebook                         Docker   Git@main   1
s2i-minimal-gpu-cuda-11.0.3-notebook      Docker   Git@nb-2   1
s2i-pytorch-gpu-cuda-11.0.3-notebook      Source   Git@nb-2   1
s2i-tensorflow-gpu-cuda-11.0.3-notebook   Source   Git@nb-2   1

Summary:

It's very difficult to tell whether all pieces have been updated or not.
So it's easy to think that the environment has been fully updated when in reality, it was only partially updated.
This leaves the customer's environment in a state where 60% of the software is from one version, and 40% is from the other.
Even if there are no noticeable side effects, this is a configuration that has not been tested or vetted by Red Hat.
The process for BuildConfig (using tags called NB-1, NB-2) is very opaque.
I was told that the tensorflow should be at NB-4, and mine is at NB-2.
There is no way for me to know that. If the tag was "rhods1.1.1-34", it would make it easier for me to see that something has gone wrong.