-
Task
-
Resolution: Obsolete
-
Major
-
None
-
None
-
None
-
None
-
Tracing Sprint #33, Tracing Sprint #34
The issue OSSM-104 describes a scenario that I'm trying to reproduce without success and I need some help from our QE.
When installing the "Red Hat Service Mesh" operator via OLM (Operator Hub) in an OpenShift cluster, the Jaeger Operator is brought along.
Once a Service Mesh CR is applied, it provisions a Jaeger instance as well with the following CR:
{ "apiVersion":"jaegertracing.io/v1", "kind":"Jaeger", "metadata":{ "annotations":{ "maistra.io/mesh-generation":"1.0.2-7.el8-2" }, "labels":{ "app.kubernetes.io/component":"tracing", "app.kubernetes.io/instance":"istio-system", "app.kubernetes.io/managed-by":"maistra-istio-operator", "app.kubernetes.io/name":"tracing", "app.kubernetes.io/part-of":"istio", "app.kubernetes.io/version":"1.0.2-7.el8-2", "chart":"tracing", "heritage":"Tiller", "maistra.io/owner":"istio-system", "release":"istio" }, "name":"jaeger", "namespace":"istio-system", "ownerReferences":[ { "apiVersion":"maistra.io/v1", "blockOwnerDeletion":true, "controller":true, "kind":"ServiceMeshControlPlane", "name":"full-install", "uid":"9046e1ad-0576-11ea-9317-fa163e155851" } ] }, "spec":{ "affinity":{ "nodeAffinity":{ "preferredDuringSchedulingIgnoredDuringExecution":[ { "preference":{ "matchExpressions":[ { "key":"beta.kubernetes.io/arch", "operator":"In", "values":[ "amd64" ] } ] }, "weight":2 }, { "preference":{ "matchExpressions":[ { "key":"beta.kubernetes.io/arch", "operator":"In", "values":[ "ppc64le" ] } ] }, "weight":2 }, { "preference":{ "matchExpressions":[ { "key":"beta.kubernetes.io/arch", "operator":"In", "values":[ "s390x" ] } ] }, "weight":2 } ], "requiredDuringSchedulingIgnoredDuringExecution":{ "nodeSelectorTerms":[ { "matchExpressions":[ { "key":"beta.kubernetes.io/arch", "operator":"In", "values":[ "amd64", "ppc64le", "s390x" ] } ] } ] } } }, "agent":null, "allInOne":{ "annotations":null, "options":{ "log-level":"debug", "query":{ "base-path":"/" } } }, "ingress":{ "annotations":null, "enabled":true, "openshift":{ "htpasswdFile":"/etc/proxy/htpasswd/auth", "sar":"{\"namespace\": \"istio-system\", \"resource\": \"pods\", \"verb\": \"get\"}" }, "security":"oauth-proxy" }, "resources":{ "limits":null, "requests":{ "cpu":"10m", "memory":"128Mi" } }, "storage":{ "options":{ "memory":{ "max-traces":50000 } } }, "strategy":"allInOne", "ui":{ "options":{ "dependencies":{ "menuEnabled":false }, "menu":[ { "items":[ { "label":"Documentation", "url":"https://www.jaegertracing.io/docs/latest" }, { "anchorTarget":"_self", "label":"Log Out", "url":"/oauth/sign_in" } ], "label":"About Jaeger" } ] } }, "volumeMounts":[ { "mountPath":"/etc/proxy/htpasswd", "name":"secret-htpasswd" } ], "volumes":[ { "name":"secret-htpasswd", "secret":{ "secretName":"htpasswd" } } ] } }
After everything gets stable, this is the final state of the pods. Note that the Jaeger deployment has two containers, one being the OAuth Proxy:
$ oc get pods -n istio-system NAME READY STATUS RESTARTS AGE grafana-b67df64b6-9mqmn 2/2 Running 0 12m istio-citadel-79979464d-5btml 1/1 Running 0 17m istio-egressgateway-7d897695c4-2f6cf 1/1 Running 0 13m istio-galley-6bb46858c5-htlpc 1/1 Running 0 16m istio-ingressgateway-8465bbf788-c4tv5 1/1 Running 0 13m istio-pilot-5d5bdb9556-qbzvt 2/2 Running 0 14m istio-policy-588f4565bc-79hxf 2/2 Running 0 15m istio-sidecar-injector-65cd4c8c6f-6mshr 1/1 Running 0 13m istio-telemetry-8484556489-hpc2s 2/2 Running 0 15m jaeger-57776787bc-t6m7t 2/2 Running 0 16m kiali-584b4c7f7c-9zxp2 1/1 Running 0 11m prometheus-b8bdc6b77-8xzcd 2/2 Running 0 16m
The problem is that after some time, without any intervention, the .Spec.Ingress.Security is changed to "none", which causes the operator to reconcile the Jaeger deployment to remove the OAuth Proxy:
$ kubectl get pods -n istio-system NAME READY STATUS RESTARTS AGE grafana-b67df64b6-5b7cm 2/2 Running 0 5h14m istio-citadel-79979464d-g2895 1/1 Running 0 5h23m istio-egressgateway-7d897695c4-8gwdh 1/1 Running 0 5h16m istio-galley-6bb46858c5-7z5xt 1/1 Running 0 5h20m istio-ingressgateway-8465bbf788-9mgtn 1/1 Running 0 5h16m istio-pilot-5d5bdb9556-m6dg7 2/2 Running 0 5h17m istio-policy-588f4565bc-hktf4 2/2 Running 0 5h19m istio-sidecar-injector-65cd4c8c6f-gqmw8 1/1 Running 0 5h16m istio-telemetry-8484556489-btkwv 2/2 Running 0 5h19m jaeger-6966d9545b-xxkfz 1/1 Running 0 4h43m kiali-6d6f9cf658-kz4h2 1/1 Running 0 5h11m prometheus-b8bdc6b77-98r4m 2/2 Running 0 5h22m
Note how both the replica set ID and the pod ID changed for the Jaeger deployment, indicating that there was a change to the deployment resource. For some reason, the route isn't being updated, which causes a mismatch between the assumptions (with/without OAuth Proxy). The problem with the route will be tracked in a different JIRA.
This task is about reproducing the problem, so that we understand where the CR change is coming from. So far, I couldn't reproduce the problem, and "Security" never changed to "none" for me.
This is how I tried to reproduce the problem with CRC, although it would be ideal to attempt to reproduce in the same infra that rcernich1 is using.
$ crc config set memory 16384 $ crc config set cpus 6 $ crc start $ crc console # install the Red Hat Service Mesh and wait for the operators to get stable $ oc login ... $ oc new-project istio-system $ oc create -n istio-system -f https://raw.githubusercontent.com/Maistra/istio-operator/maistra-1.0/deploy/examples/maistra_v1_servicemeshcontrolplane_cr_full.yaml
Versions:
$ oc version Client Version: openshift-clients-4.2.1-201910220950 Server Version: 4.2.4 Kubernetes Version: v1.14.6+dc8862c $ oc logs -n openshift-operators jaeger-operator-98dd965f5-hhnjb | head -n 1 time="2019-11-14T16:09:22Z" level=info msg=Versions arch=amd64 jaeger-operator=v1.13.1.redhat8 operator-sdk=v0.8.1 os=linux version=go1.11.5
- relates to
-
TRACING-1117 Errors force jaeger operator to fallback to k8s, resetting security
- Closed