-
Ticket
-
Resolution: Done
-
Major
-
None
-
OSSM 2.4.3
-
None
-
False
-
-
False
-
-
Description of problem:
In disconnected environments certain conditions cause the updateOauthProxyConfig [1] function to fail at startup, e.g.:
- shortly after SNO node reboot the Image API may not be available yet, preventing the function from reading the dockerImageReference from the ImageStream.
- oauth-proxy ImageStream's dockerReferenceImage empty due to (temporary) pull failure during operator startup.
Once the operator has started [2], the detection never runs again. As a result the managed prometheus pod remains in ImagePullBackOff since it cannot pull the oauth-proxy image.
Version-Release number of selected component (if applicable):
- OCP 4.12.32
- OSSM 2.4.3-0
How reproducible:
100% on disconnected environment
Steps to Reproduce:
$ curl -L https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.12.34/oc-mirror.tar.gz|tar -xzf - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 52.9M 100 52.9M 0 0 19.0M 0 0:00:02 0:00:02 --:--:-- 19.0M $ chmod +x ./oc-mirror $ ./oc-mirror version Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.12.0-202308291001.p0.g3ac49d9.assembly.stream-3ac49d9", GitCommit:"3ac49d9bd7c2193ede794e328dfa1142d7735f2e", GitTreeState:"clean", BuildDate:"2023-08-29T13:29:58Z", GoVersion:"go1.19.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"} $ cat > ossm-is.yaml <<EOF kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v1alpha2 archiveSize: 4 storageConfig: registry: imageURL: mirror.local/mirror/oc-mirror-metadata skipTLS: true mirror: platform: channels: - name: stable-4.12 minVersion: 4.12.30 type: ocp graph: true operators: - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12 packages: - name: servicemeshoperator channels: - name: stable - name: jaeger-product channels: - name: stable - name: kiali-ossm channels: - name: stable - name: elasticsearch-operator channels: - name: stable EOF $ ./oc-mirror --dest-skip-tls --config=./ossm-is.yaml docker://mirror.local/ossm-ocp-412 <...> info: Mirroring completed in 7m0.03s (149.2MB/s) Rendering catalog image "mirror.local/ossm-ocp-412/redhat/redhat-operator-index:v4.12" with file-based catalog Writing image mapping to oc-mirror-workspace/results-1695653324/mapping.txt Writing UpdateService manifests to oc-mirror-workspace/results-1695653324 Writing CatalogSource manifests to oc-mirror-workspace/results-1695653324 Writing ICSP manifests to oc-mirror-workspace/results-1695653324 Installed a cluster with the imagecontentsources: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.32 True False 6m43s Cluster version is 4.12.32 $ oc apply -f oc-mirror-workspace/results-1695653324/catalogSource-redhat-operator-index.yaml catalogsource.operators.coreos.com/redhat-operator-index created $ oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]' operatorhub.config.openshift.io/cluster patched # installing all 4 operators through console $ oc get csv -n openshift-operators NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.v5.7.6 OpenShift Elasticsearch Operator 5.7.6 elasticsearch-operator.v5.7.5 Succeeded jaeger-operator.v1.47.0-2 Red Hat OpenShift distributed tracing platform 1.47.0-2 jaeger-operator.v1.42.0-5-0.1687199951.p Succeeded kiali-operator.v1.65.8 Kiali Operator 1.65.8 kiali-operator.v1.65.7 Succeeded servicemeshoperator.v2.4.3 Red Hat OpenShift Service Mesh 2.4.3-0 servicemeshoperator.v2.4.2 Succeeded # create ServiceMeshControlPlanes through console $ oc new-project test-ossm $ oc apply -f - <<EOF apiVersion: maistra.io/v1 kind: ServiceMeshMemberRoll metadata: name: default namespace: istio-system spec: members: # a list of projects joined into the service mesh - test-ossm EOF servicemeshmemberroll.maistra.io/default created $ oc get pods -n istio-system NAME READY STATUS RESTARTS AGE istiod-basic-74468b6865-bcqnt 1/1 Running 0 5m38s prometheus-54bddd4b76-pstrl 2/3 ImagePullBackOff 0 5m26s $ oc get deployment -n istio-system prometheus -o go-template='{{range .spec.template.spec.containers}}{{.name}}{{": "}}{{.image}}{{"\n"}}{{end}}' prometheus-proxy: registry.redhat.io/openshift4/ose-oauth-proxy:v4.9 prometheus: registry.redhat.io/openshift4/ose-prometheus@sha256:203dd4282f288c5781ed20cb455e37ac82389bf8dc882d3858c9f609b7e06073 config-reloader: registry.redhat.io/openshift4/ose-prometheus-config-reloader@sha256:f0bcbfb672d79ef087b785c26f0ea2976d690455d2ab1383cd40dcfc8fb7ea2a
Actual results:
$ oc get events -n istio-system |grep oauth-proxy 3h7m Normal BackOff pod/prometheus-54bddd4b76-pstrl Back-off pulling image "registry.redhat.io/openshift4/ose-oauth-proxy:v4.9"3h7m Normal Pulling pod/prometheus-54bddd4b76-r7pmh Pulling image "registry.redhat.io/openshift4/ose-oauth-proxy:v4.9"
Expected results:
prometheus pod should run successfully.
Additional info:
An earlier bug reported [3] for this situation concluded the ImageStream's presence as a requirement to install the operator and a restart of the operator to work around this, but that's not a viable option for managed/automated clusters.
Ideally the updateOauthProxyConfig should run on every reconcile loop to ensure the correct (mirrored) oauth-proxy image is eventually detected.
- is caused by
-
OSSM-5347 [RFE] Detect oauth-proxy image after operator bootstrap
-
- Closed
-
- links to