Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.3.0.GA
Affects Version/s: None
Component/s: None
Labels:
- day2

Blocked:
False
Ready:
False
Target Release:

2.3.0.GA
Git Pull Request:
https://github.com/strimzi/strimzi-kafka-operator/pull/7421
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When doing the ZooKeeper rolling update, we do not sufficiently check the state of all the pods. It can therefore happen, that in one reconciliation, we take down one of the Zoo pods and wait for it to get ready. But if it doesn't get ready, the reconciliation fails, reports error and ends. Another reconciliation will pick up and it will ignore the pod which is not ready and move to the next pod. And so on. So with enough time, we take the whole ZooKeeper cluster down.

One easy example how to reproduce it is this:

Deploy Kafka cluster
Edit the ZooKeeper resources in the Kafka CR to some unrealistically high value
Let the operator deal with it

=> with enough time, it rolls all 3 ZooKeeper pods to the Pending state. So this seems to be something we should fix. Marking it as a bug.

This should be fixed, it should first try to fix the already unready pods before moving on to the ready pods.

Created by Strimzi#1001

Assignee:: Lukas Kral

Reporter:: JAkub Scholz

Tester:: Lukas Kral

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/03/03 3:50 PM

Updated:: 2022/12/15 3:09 PM

Resolved:: 2022/11/23 12:51 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates