-
Ticket
-
Resolution: Won't Do
-
Major
-
None
-
OSSM 2.4.1
-
None
-
False
-
None
-
False
-
-
Issue:
When the service mesh control plane tries to rollout changes to the Elasticsearch cluster the Elastic cluster will not render back into "green" state and the elastic operator keeps logging timeout errors. The rollout works for the first elastic pod but never comes back for the second and the operator will show logs like this
{"_ts":"2023-08-11T07:07:50.588103206Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Beginning restart of node","cluster":"elasticsearch","namespace":"istio-system","node":"elasticsearch-cdm-istiosystemjaeger-1"} {"_ts":"2023-08-11T07:08:21.168672356Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"failed to perform rolling update","_error": {"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-istiosystemjaeger-1"} ,"cluster":"elasticsearch","namespace":"istio-system"} {"ts":"2023-08-11T07:08:21.744476786Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Unable to parse quantity","_error":{"msg":"quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9]{_})$'"},"cluster":"elasticsearch","namespace":"istio-system","node":{"deploymentName":"elasticsearch-cdm-istiosystemjaeger-1","upgradeStatus":{"scheduledUpgrade":"True","underUpgrade":"True","upgradePhase":"preparationComplete"}},"value":""} {"ts":"2023-08-11T07:08:24.019570161Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Unable to parse quantity","_error":{"msg":"quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9]{_})$'"},"cluster":"elasticsearch","namespace":"istio-system","node":{"deploymentName":"elasticsearch-cdm-istiosystemjaeger-1","upgradeStatus":{"scheduledUpgrade":"True","underUpgrade":"True","upgradePhase":"preparationComplete"}},"value":""} {"_ts":"2023-08-11T07:08:39.108135125Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Completed restart of node","cluster":"elasticsearch","namespace":"istio-system","node":"elasticsearch-cdm-istiosystemjaeger-1"} {"_ts":"2023-08-11T07:08:40.260323902Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Beginning restart of node","cluster":"elasticsearch","namespace":"istio-system","node":"elasticsearch-cdm-istiosystemjaeger-2"} {"_ts":"2023-08-11T07:09:10.632943695Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"failed to perform rolling update","_error": {"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-istiosystemjaeger-2"} ,"cluster":"elasticsearch","namespace":"istio-system"} {"_ts":"2023-08-11T07:09:40.907804383Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"unable to update node","_error": {"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-istiosystemjaeger-2"} ,"cluster":"elasticsearch","namespace":"istio-system"}
Steps to reproduce:
Versions: OCP 4.11.34
Step 1: Install ALL 4 operators ( Elasticsearch , Kiali , jaeger , OpenShift Mesh ) - OSSM 2.4.1 Elastic 5.7.4 Distributed Tracing platform 1.42.0 Kiali 1.65.7
NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.v5.7.4 OpenShift Elasticsearch Operator 5.7.4 elasticsearch-operator.v5.7.3 Succeeded jaeger-operator.v1.42.0-5-0.1687199951.p Red Hat OpenShift distributed tracing platform 1.42.0-5+0.1687199951.p jaeger-operator.v1.34.1-5 Succeeded kiali-operator.v1.65.7 Kiali Operator 1.65.7 kiali-operator.v1.65.6 Succeeded openshift-gitops-operator.v1.5.10 Red Hat OpenShift GitOps 1.5.10 openshift-gitops-operator.v1.5.9 Succeeded servicemeshoperator.v2.4.1 Red Hat OpenShift Service Mesh 2.4.1-0 servicemeshoperator.v2.4.0 Succeeded
Step 2: Configure SMCP with attached YAML
Step 3: Check if the pods are up and running:
NOTE: elasticsearch pods should be (2/2).
$ oc get po NAME READY STATUS RESTARTS AGE elasticsearch-cdm-istiosystemjaeger-1-6fc965fc74-g5k4f 2/2 Running 0 2m58s elasticsearch-cdm-istiosystemjaeger-2-79b5c59f6-z2clw 2/2 Running 0 2m57s elasticsearch-cdm-istiosystemjaeger-3-f98c44c54-lthnh 2/2 Running 0 2m56s istio-egressgateway-6987f85dd-xnw9j 1/1 Running 0 3m8s istio-ingressgateway-8479c7b8d5-wnsjg 1/1 Running 0 3m8s istiod-basic-b5d86cfbb-fgnsg 1/1 Running 0 3m49s jaeger-collector-54c5f68dfc-wt8x5 1/1 Running 0 2m4s jaeger-query-7f9996696c-h696s 3/3 Running 0 2m4s kiali-6dc546c6df-4rjbb 1/1 Running 0 90s prometheus-865b698cdf-hsqt4 3/3 Running 0 3m24s
Step 4: Change the memory in the tracing.jaeger.elasticsearch.container.resources.memory.
Step 5: Observe the changes in the following CRs to verify if the memory / cpu changes done are reflected or not:
$ oc get jaeger jaeger -o json | jq .spec.storage.elasticsearch.resources
$ oc get elasticsearch elasticsearch -o json | jq .spec.nodeSpec.resources
$ oc get deployment [ deployment-name ] -o json | jq .spec.template.spec.containers[0].resources
Notice that the deployments are indicating some values in millies as follows
{ "limits": { "memory": "2469606195200m" }, "requests": { "cpu": "250m", "memory": "1Gi" } }
Step 6:
The logs of elasticsearch pods indicate the following logs only:
[2023-08-24T09:33:42,291][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-cdm-istiosystemjaeger-1] removed { {elasticsearch-cdm-istiosystemjaeger-2} {oZHe8TMSSZ-MF_LuI3Ndcg} {bwk0X67MRciZ0Tga0yxJxQ} {10.131.0.27} {10.131.0.27:9300},}, reason: apply cluster state (from master [master {elasticsearch-cdm-istiosystemjaeger-3} {SSUYBI2wR02pavTmkirruw} {UH7BrnmzStCLLzv8YJpXHg} {10.128.2.26} {10.128.2.26:9300} committed version [45]]) [2023-08-24T09:33:45,207][WARN ][r.suppressed ] [elasticsearch-cdm-istiosystemjaeger-1] path: /.security/security/roles, params: {index=.security, id=roles, type=security}
Index list indicates:
$ indices Thu Aug 24 10:45:36 UTC 2023 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size red open .security ezWq-Tz1Q-CzZpismjio5A 1 1
Step 6: Observe the Elastic cluster never returning to GREEN state and the ES operator logging timeouts