Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-3503

[DOC] Document Cruise Control CPU balancing bug

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • 2.0.0.GA
    • None
    • documentation
    • None
    • False
    • False

      We have a Cruise Control bug that can prevent rebalances when using common deployment set ups. Luckily, we also have a workaround for the issue. We should add this workaround to the release notes.

      The bug is in the way Cruise Control calculates CPU utilization estimation [1]  This can prevent cluster rebalances when the number of logical processors of a node > the CPU limit of a Kafka broker pod on that node and the pod is under heavy load. I imagine this situation would arise quite often, however this bug exists in our passed releases that have balancing based on CPU enabled. We can get around the issue by disabling CPU goals in the Kafka resource like this:

      cruiseControl:
          config:
            hard.goals: >
              com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal
            default.goals: >
              com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal 

      [1] https://github.com/strimzi/strimzi-kafka-operator/issues/5951

              pmellor@redhat.com Paul Mellor
              kliberti Kyle Liberti
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: