Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-3840

ZooKeeper rolling update handling of unready pods

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.3.0.GA
    • None
    • None

      When doing the ZooKeeper rolling update, we do not sufficiently check the state of all the pods. It can therefore happen, that in one reconciliation, we take down one of the Zoo pods and wait for it to get ready. But if it doesn't get ready, the reconciliation fails, reports error and ends. Another reconciliation will pick up and it will ignore the pod which is not ready and move to the next pod. And so on. So with enough time, we take the whole ZooKeeper cluster down.

      One easy example how to reproduce it is this:

      • Deploy Kafka cluster
      • Edit the ZooKeeper resources in the Kafka CR to some unrealistically high value
      • Let the operator deal with it

      => with enough time, it rolls all 3 ZooKeeper pods to the Pending state. So this seems to be something we should fix. Marking it as a bug.

      This should be fixed, it should first try to fix the already unready pods before moving on to the ready pods.

      Created by Strimzi#1001

            lkral Lukas Kral
            scholzj JAkub Scholz
            Lukas Kral Lukas Kral
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: