Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2126

[Logging 5.3] Elasticsearch cluster upgrade stuck

XMLWordPrintable

    • False
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, defining a toleration with no key and the existing operator caused the operator to be unable to complete an upgrade. With this update, this toleration no longer blocks the upgrade from completing.
      Show
      Before this update, defining a toleration with no key and the existing operator caused the operator to be unable to complete an upgrade. With this update, this toleration no longer blocks the upgrade from completing.
    • Logging (LogExp) - Sprint 214, Logging (LogExp) - Sprint 215

      Issue: Elasticsearch cluster upgrade stuck.

       

      Findings:

      We can see pod elasticsearch-cdm-tnwarhmo-1 is on the latest image and elasticsearch-cdm-tnwarhmo-2 and elasticsearch-cdm-tnwarhmo-3 are on the old image.

       

      ~~~
        nodes:
        - deploymentName: elasticsearch-cdm-tnwarhmo-1
          upgradeStatus:
            scheduledUpgrade: "True"
            underUpgrade: "True"
            upgradePhase: preparationComplete
        - deploymentName: elasticsearch-cdm-tnwarhmo-2
          upgradeStatus:
            scheduledUpgrade: "True"
        - deploymentName: elasticsearch-cdm-tnwarhmo-3
          upgradeStatus:
            scheduledUpgrade: "True"
      ~~~

       

      Verified the latest image by installing the test logging cluster.

       

      Elasticsearch operator logs.

      ~~~

      {"_ts":"2022-01-103048.820353447Z","_level":"0","_component":"elasticsearch-operator","_message":"unable to update node","_error":\{"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-tnwarhmo-1"}

      ,"cluster":"elasticsearch","namespace":"openshift-logging"}
      ~~~

       

      Pod 'elasticsearch-cdm-tnwarhmo-1' is already restarted and is up and running. I am wondering why the operator is stuck at 'timed out waiting for node to rollout'.

       

      I tried to replicate this issue in the test cluster but the upgrade was completed successfully.

       

      ~~~
      [demeestg@pinocp01 ~]$ oc get sub,ip,csv -n openshift-logging
      NAME                                                PACKAGE           SOURCE             CHANNEL
      subscription.operators.coreos.com/cluster-logging   cluster-logging   redhat-operators   stable-5.3NAME                                             CSV                        APPROVAL    APPROVED
      installplan.operators.coreos.com/install-hlbhw   cluster-logging.5.3.0-55   Automatic   true
      installplan.operators.coreos.com/install-m5bvh   cluster-logging.5.3.2-20   Automatic   true
      installplan.operators.coreos.com/install-r6wfq   cluster-logging.5.3.1-12   Automatic   trueNAME                                                                           DISPLAY                            VERSION    REPLACES                            PHASE
      clusterserviceversion.operators.coreos.com/cluster-logging.5.3.2-20            Red Hat OpenShift Logging          5.3.2-20   cluster-logging.5.3.1-12            Succeeded
      clusterserviceversion.operators.coreos.com/elasticsearch-operator.5.3.2-20     OpenShift Elasticsearch Operator   5.3.2-20   elasticsearch-operator.5.3.1-12     Succeeded
      clusterserviceversion.operators.coreos.com/redhat-openshift-pipelines.v1.5.2   Red Hat OpenShift Pipelines        1.5.2      redhat-openshift-pipelines.v1.4.1   Succeeded
      ~~~

       

      Please let me know in case of any other information is required.

              gvanloo Gerard Vanloo (Inactive)
              rhn-support-aharchin Akhil Harchinder (Inactive)
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: