Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-4709

Changing settings on Elasticsearch in SMCP puts the Elastic cluster in unavailable state

XMLWordPrintable

    • Icon: Ticket Ticket
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • OSSM 2.4.1
    • Jaeger
    • None
    • False
    • None
    • False

      Issue:

      When the service mesh control plane tries to rollout changes to the Elasticsearch cluster the Elastic cluster will not render back into "green" state and the elastic operator keeps logging timeout errors. The rollout works for the first elastic pod but never comes back for the second and the operator will show logs like this

       

      {"_ts":"2023-08-11T07:07:50.588103206Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Beginning restart of node","cluster":"elasticsearch","namespace":"istio-system","node":"elasticsearch-cdm-istiosystemjaeger-1"}
      {"_ts":"2023-08-11T07:08:21.168672356Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"failed to perform rolling update","_error":
      {"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-istiosystemjaeger-1"}
      ,"cluster":"elasticsearch","namespace":"istio-system"}
      {"ts":"2023-08-11T07:08:21.744476786Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Unable to parse quantity","_error":{"msg":"quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9]{_})$'"},"cluster":"elasticsearch","namespace":"istio-system","node":{"deploymentName":"elasticsearch-cdm-istiosystemjaeger-1","upgradeStatus":{"scheduledUpgrade":"True","underUpgrade":"True","upgradePhase":"preparationComplete"}},"value":""}
      {"ts":"2023-08-11T07:08:24.019570161Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Unable to parse quantity","_error":{"msg":"quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9]{_})$'"},"cluster":"elasticsearch","namespace":"istio-system","node":{"deploymentName":"elasticsearch-cdm-istiosystemjaeger-1","upgradeStatus":{"scheduledUpgrade":"True","underUpgrade":"True","upgradePhase":"preparationComplete"}},"value":""}
      {"_ts":"2023-08-11T07:08:39.108135125Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Completed restart of node","cluster":"elasticsearch","namespace":"istio-system","node":"elasticsearch-cdm-istiosystemjaeger-1"} {"_ts":"2023-08-11T07:08:40.260323902Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Beginning restart of node","cluster":"elasticsearch","namespace":"istio-system","node":"elasticsearch-cdm-istiosystemjaeger-2"}
      {"_ts":"2023-08-11T07:09:10.632943695Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"failed to perform rolling update","_error":
      {"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-istiosystemjaeger-2"}
      ,"cluster":"elasticsearch","namespace":"istio-system"}
      {"_ts":"2023-08-11T07:09:40.907804383Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"unable to update node","_error":
      {"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-istiosystemjaeger-2"}
      ,"cluster":"elasticsearch","namespace":"istio-system"}
      

       

       

      Steps to reproduce:

      Versions: OCP 4.11.34 

       

      Step 1: Install ALL 4 operators ( Elasticsearch , Kiali , jaeger , OpenShift Mesh ) - OSSM 2.4.1 Elastic 5.7.4 Distributed Tracing platform 1.42.0 Kiali 1.65.7

       

       

      NAME                                       DISPLAY                                          VERSION                   REPLACES                           PHASE
      elasticsearch-operator.v5.7.4              OpenShift Elasticsearch Operator                 5.7.4                     elasticsearch-operator.v5.7.3      Succeeded
      jaeger-operator.v1.42.0-5-0.1687199951.p   Red Hat OpenShift distributed tracing platform   1.42.0-5+0.1687199951.p   jaeger-operator.v1.34.1-5          Succeeded
      kiali-operator.v1.65.7                     Kiali Operator                                   1.65.7                    kiali-operator.v1.65.6             Succeeded
      openshift-gitops-operator.v1.5.10          Red Hat OpenShift GitOps                         1.5.10                    openshift-gitops-operator.v1.5.9   Succeeded
      servicemeshoperator.v2.4.1                 Red Hat OpenShift Service Mesh                   2.4.1-0                   servicemeshoperator.v2.4.0         Succeeded 

       

      Step 2: Configure SMCP with attached YAML

      Step 3: Check if the pods are up and running: 

      NOTE: elasticsearch pods should be (2/2). 

       

       

      $ oc get po 
      NAME                                                     READY   STATUS    RESTARTS   AGE
      elasticsearch-cdm-istiosystemjaeger-1-6fc965fc74-g5k4f   2/2     Running   0          2m58s
      elasticsearch-cdm-istiosystemjaeger-2-79b5c59f6-z2clw    2/2     Running   0          2m57s
      elasticsearch-cdm-istiosystemjaeger-3-f98c44c54-lthnh    2/2     Running   0          2m56s
      istio-egressgateway-6987f85dd-xnw9j                      1/1     Running   0          3m8s
      istio-ingressgateway-8479c7b8d5-wnsjg                    1/1     Running   0          3m8s
      istiod-basic-b5d86cfbb-fgnsg                             1/1     Running   0          3m49s
      jaeger-collector-54c5f68dfc-wt8x5                        1/1     Running   0          2m4s
      jaeger-query-7f9996696c-h696s                            3/3     Running   0          2m4s
      kiali-6dc546c6df-4rjbb                                   1/1     Running   0          90s
      prometheus-865b698cdf-hsqt4                              3/3     Running   0          3m24s

      Step 4: Change the memory in the  tracing.jaeger.elasticsearch.container.resources.memory. 

       

      Step 5: Observe the changes in the following CRs to verify if the memory / cpu changes done are reflected or not: 

       

      $ oc get jaeger jaeger -o json | jq .spec.storage.elasticsearch.resources

       

       

      $ oc get elasticsearch elasticsearch -o json | jq .spec.nodeSpec.resources

       

       

      $ oc get deployment [ deployment-name ] -o json | jq .spec.template.spec.containers[0].resources

       

      Notice that the deployments are indicating some values in millies as follows

       

      {
        "limits": {
          "memory": "2469606195200m"
        },
        "requests": {
          "cpu": "250m",
          "memory": "1Gi"
        }
      }

       

       

      Step 6: 

      The logs of elasticsearch pods indicate the following logs only: 

       

       

      [2023-08-24T09:33:42,291][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-cdm-istiosystemjaeger-1] removed {
      {elasticsearch-cdm-istiosystemjaeger-2} {oZHe8TMSSZ-MF_LuI3Ndcg}
      {bwk0X67MRciZ0Tga0yxJxQ}
      {10.131.0.27}
      {10.131.0.27:9300},}, reason: apply cluster state (from master [master {elasticsearch-cdm-istiosystemjaeger-3}
      {SSUYBI2wR02pavTmkirruw}
      {UH7BrnmzStCLLzv8YJpXHg}
      {10.128.2.26}
      {10.128.2.26:9300} committed version [45]])
      [2023-08-24T09:33:45,207][WARN ][r.suppressed             ] [elasticsearch-cdm-istiosystemjaeger-1] path: /.security/security/roles, params: {index=.security, id=roles, type=security}
      

      Index list indicates: 

      $ indices
      Thu Aug 24 10:45:36 UTC 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      red    open   .security ezWq-Tz1Q-CzZpismjio5A   1   1  

              

       

      Step 6: Observe the Elastic cluster never returning to GREEN state and the ES operator  logging timeouts

        1. smcp.yaml
          2 kB
          Anne Faulhaber

            Unassigned Unassigned
            rhn-support-afaulhab Anne Faulhaber
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: