Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-6342

Remove-brokers rebalancing seems to get stuck by race condition

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 2.9.0.GA
    • None
    • None
    • None
    • False
    • None
    • False

      When un-empty nodes are scaled down, the scale-down is blocked ad the nodes need to be first cleaned up for example using the remove-brokers feature in Cruise Control. Once the scaled-down nodes are empty, CO will execute the scale-down and delete them. But it seems that there is a space for a race condition between the KafkaAssemblyOperator and KafkaRebalanceAssemblyOperator:

      • The remove brokers rebalance is ongoing and KafkaRebaanceAssemblyOperator marks the KafkaRebalance resource as Rebalancing and periodically (every 2 minutes) checks the progress
      • KafkaAssemblyOperator sees that the nodes are already empty and proceeds to scale-down the broker and roll Cruise Control with the new cluster configuration
      • Later (after the CC is rolled) the KafkaRebalanceAssemblyOperator starts another reconciliation round. But it seems that:
        • Cruise Control does not like the request anymore and throws exception:
             
                com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: java.lang.IllegalArgumentException: Broker 14 does not exist.
                 

           

        • The KafkaRebalanceAssemblyOperator tries to recreate it and seems to get stuck:
           
          colog | grep "#313(timer)"
          2024-09-23 21:13:37 INFO  AbstractOperator:266 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): KafkaRebalance my-cluster-auto-rebalancing-remove-brokers will be checked for creation or modification
          2024-09-23 21:13:37 INFO  KafkaRebalanceAssemblyOperator:317 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Rebalance action is performed and KafkaRebalance resource is currently in [Rebalancing] state
          2024-09-23 21:13:37 INFO  KafkaRebalanceAssemblyOperator:854 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Getting Cruise Control rebalance user task status
          2024-09-23 21:13:37 WARN  KafkaRebalanceAssemblyOperator:863 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): User task 670c1383-aa04-4979-8cc6-41fe9f69efce not found, going to generate a new proposal
          2024-09-23 21:13:37 INFO  KafkaRebalanceAssemblyOperator:1113 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Requesting Cruise Control rebalance [dryrun=true]
          2024-09-23 21:14:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
          2024-09-23 21:15:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
          2024-09-23 21:16:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
          2024-09-23 21:17:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
          2024-09-23 21:18:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
          2024-09-23 21:19:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress
          2024-09-23 21:20:37 INFO  AbstractOperator:401 - Reconciliation #313(timer) KafkaRebalance(myproject/my-cluster-auto-rebalancing-remove-brokers): Reconciliation is in progress

      Created by Strimzi#10631

              ppatiern Paolo Patierno
              scholzj JAkub Scholz
              Maros Orsak Maros Orsak
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: