Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2302

[Logging 5.4] Elasticsearch cluster upgrade stuck

    XMLWordPrintable

Details

    • False
    • False
    • NEW
    • VERIFIED
    • Before this update, defining a toleration with no key and the exists operator caused the operator to be unable to complete an upgrade. With this update, this toleration no longer blocks the upgrade from completing.
    • Logging (LogExp) - Sprint 214, Logging (LogExp) - Sprint 215

    Description

      Issue: Elasticsearch cluster upgrade stuck.

       

      Findings:

      We can see pod elasticsearch-cdm-tnwarhmo-1 is on the latest image and elasticsearch-cdm-tnwarhmo-2 and elasticsearch-cdm-tnwarhmo-3 are on the old image.

       

      ~~~
        nodes:
        - deploymentName: elasticsearch-cdm-tnwarhmo-1
          upgradeStatus:
            scheduledUpgrade: "True"
            underUpgrade: "True"
            upgradePhase: preparationComplete
        - deploymentName: elasticsearch-cdm-tnwarhmo-2
          upgradeStatus:
            scheduledUpgrade: "True"
        - deploymentName: elasticsearch-cdm-tnwarhmo-3
          upgradeStatus:
            scheduledUpgrade: "True"
      ~~~

       

      Verified the latest image by installing the test logging cluster.

       

      Elasticsearch operator logs.

      ~~~

      {"_ts":"2022-01-103048.820353447Z","_level":"0","_component":"elasticsearch-operator","_message":"unable to update node","_error":\{"msg":"timed out waiting for node to rollout","node":"elasticsearch-cdm-tnwarhmo-1"}

      ,"cluster":"elasticsearch","namespace":"openshift-logging"}
      ~~~

       

      Pod 'elasticsearch-cdm-tnwarhmo-1' is already restarted and is up and running. I am wondering why the operator is stuck at 'timed out waiting for node to rollout'.

       

      I tried to replicate this issue in the test cluster but the upgrade was completed successfully.

       

      ~~~
      [demeestg@pinocp01 ~]$ oc get sub,ip,csv -n openshift-logging
      NAME                                                PACKAGE           SOURCE             CHANNEL
      subscription.operators.coreos.com/cluster-logging   cluster-logging   redhat-operators   stable-5.3NAME                                             CSV                        APPROVAL    APPROVED
      installplan.operators.coreos.com/install-hlbhw   cluster-logging.5.3.0-55   Automatic   true
      installplan.operators.coreos.com/install-m5bvh   cluster-logging.5.3.2-20   Automatic   true
      installplan.operators.coreos.com/install-r6wfq   cluster-logging.5.3.1-12   Automatic   trueNAME                                                                           DISPLAY                            VERSION    REPLACES                            PHASE
      clusterserviceversion.operators.coreos.com/cluster-logging.5.3.2-20            Red Hat OpenShift Logging          5.3.2-20   cluster-logging.5.3.1-12            Succeeded
      clusterserviceversion.operators.coreos.com/elasticsearch-operator.5.3.2-20     OpenShift Elasticsearch Operator   5.3.2-20   elasticsearch-operator.5.3.1-12     Succeeded
      clusterserviceversion.operators.coreos.com/redhat-openshift-pipelines.v1.5.2   Red Hat OpenShift Pipelines        1.5.2      redhat-openshift-pipelines.v1.4.1   Succeeded
      ~~~

       

      Please let me know in case of any other information is required.

      Attachments

        Issue Links

          Activity

            People

              gvanloo Gerard Vanloo (Inactive)
              gvanloo Gerard Vanloo (Inactive)
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: