When un-empty nodes are scaled down, the scale-down is blocked ad the nodes need to be first cleaned up for example using the remove-brokers feature in Cruise Control. Once the scaled-down nodes are empty, CO will execute the scale-down and delete them. But it seems that there is a space for a race condition between the KafkaAssemblyOperator and KafkaRebalanceAssemblyOperator:
- The remove brokers rebalance is ongoing and KafkaRebaanceAssemblyOperator marks the KafkaRebalance resource as Rebalancing and periodically (every 2 minutes) checks the progress
- KafkaAssemblyOperator sees that the nodes are already empty and proceeds to scale-down the broker and roll Cruise Control with the new cluster configuration
- Later (after the CC is rolled) the KafkaRebalanceAssemblyOperator starts another reconciliation round. But it seems that:
- Cruise Control does not like the request anymore and throws exception:
com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: java.lang.IllegalArgumentException: Broker 14 does not exist.
- Cruise Control does not like the request anymore and throws exception:
-
- The KafkaRebalanceAssemblyOperator tries to recreate it and seems to get stuck:
colog | grep "#313(timer)" 2024-09-23 21:13:37 INFO AbstractOperator:266 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): KafkaRebalance my-cluster-auto-rebalancing-remove-brokers will be checked for creation or modification 2024-09-23 21:13:37 INFO KafkaRebalanceAssemblyOperator:317 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Rebalance action is performed and KafkaRebalance resource is currently in [Rebalancing] state 2024-09-23 21:13:37 INFO KafkaRebalanceAssemblyOperator:854 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Getting Cruise Control rebalance user task status 2024-09-23 21:13:37 WARN KafkaRebalanceAssemblyOperator:863 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): User task 670c1383-aa04-4979-8cc6-41fe9f69efce not found, going to generate a new proposal 2024-09-23 21:13:37 INFO KafkaRebalanceAssemblyOperator:1113 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Requesting Cruise Control rebalance [dryrun=true] 2024-09-23 21:14:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress 2024-09-23 21:15:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress 2024-09-23 21:16:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress 2024-09-23 21:17:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress 2024-09-23 21:18:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress 2024-09-23 21:19:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress 2024-09-23 21:20:37 INFO AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
- The KafkaRebalanceAssemblyOperator tries to recreate it and seems to get stuck:
Created by Strimzi#10631