It looks like the KafkaRoller has a an issue with topics which have replication factor lower than the min.insync.replicas option. It will just wait until such topic can roll safely, but never really completes because such topic will be never safe to roll. Funny enough, it blocks the rolling update, so I cannot remove the min.insync.replicas from cluster config after I run into the issue.
I run into this with following scenarion:
- Have an auto-created topic with RF=1
- Set the cluster wide option min.insync.replicas to 2
- Trigger a rolling update with whatever reason
2020-05-06 12:41:26 INFO KafkaRoller:247 - Pod 0 needs to be restarted. Reason: Pod has old generation 2020-05-06 12:41:26 INFO KafkaAvailability:109 - my-topic2/0 is already underreplicated (|ISR|=1, min.insync.replicas=2); broker 0 has a replica, so should not be restarted right now (it might be first to catch up). 2020-05-06 12:41:26 INFO KafkaRoller:218 - Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod my-cluster-kafka-0 is currently not rollable, retrying after at least 500ms 2020-05-06 12:41:27 INFO KafkaRoller:247 - Pod 0 needs to be restarted. Reason: Pod has old generation 2020-05-06 12:41:27 INFO KafkaAvailability:109 - my-topic2/0 is already underreplicated (|ISR|=1, min.insync.replicas=2); broker 0 has a replica, so should not be restarted right now (it might be first to catch up). 2020-05-06 12:41:27 INFO KafkaRoller:218 - Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod my-cluster-kafka-0 is currently not rollable, retrying after at least 1000ms 2020-05-06 12:41:28 INFO KafkaRoller:247 - Pod 0 needs to be restarted. Reason: Pod has old generation 2020-05-06 12:41:28 INFO KafkaAvailability:109 - my-topic2/0 is already underreplicated (|ISR|=1, min.insync.replicas=2); broker 0 has a replica, so should not be restarted right now (it might be first to catch up). 2020-05-06 12:41:28 INFO KafkaRoller:218 - Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod my-cluster-kafka-0 is currently not rollable, retrying after at least 2000ms 2020-05-06 12:41:30 INFO KafkaRoller:247 - Pod 0 needs to be restarted. Reason: Pod has old generation 2020-05-06 12:41:31 INFO KafkaAvailability:109 - my-topic2/0 is already underreplicated (|ISR|=1, min.insync.replicas=2); broker 0 has a replica, so should not be restarted right now (it might be first to catch up). 2020-05-06 12:41:31 INFO KafkaRoller:218 - Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod my-cluster-kafka-0 is currently not rollable, retrying after at least 4000ms 2020-05-06 12:41:35 INFO KafkaRoller:247 - Pod 0 needs to be restarted. Reason: Pod has old generation 2020-05-06 12:41:35 INFO KafkaAvailability:109 - my-topic2/0 is already underreplicated (|ISR|=1, min.insync.replicas=2); broker 0 has a replica, so should not be restarted right now (it might be first to catch up). 2020-05-06 12:41:35 INFO KafkaRoller:218 - Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod my-cluster-kafka-0 is currently not rollable, retrying after at least 8000ms 2020-05-06 12:41:43 INFO KafkaRoller:247 - Pod 0 needs to be restarted. Reason: Pod has old generation 2020-05-06 12:41:43 INFO KafkaAvailability:109 - my-topic2/0 is already underreplicated (|ISR|=1, min.insync.replicas=2); broker 0 has a replica, so should not be restarted right now (it might be first to catch up). 2020-05-06 12:41:43 INFO KafkaRoller:218 - Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod my-cluster-kafka-0 is currently not rollable, retrying after at least 16000ms
SInce it is obvious, that we cannot keep the topic available when RF <= MIN-ISR, the KafkaRoller should just roll the pod regardless of such topic.
Upstream issue Strimzi#2964