Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-3503

[DOC] Document Cruise Control CPU balancing bug

    XMLWordPrintable

Details

    • Task
    • Resolution: Done
    • Major
    • 2.0.0.GA
    • None
    • documentation
    • None
    • False
    • False

    Description

      We have a Cruise Control bug that can prevent rebalances when using common deployment set ups. Luckily, we also have a workaround for the issue. We should add this workaround to the release notes.

      The bug is in the way Cruise Control calculates CPU utilization estimation [1]  This can prevent cluster rebalances when the number of logical processors of a node > the CPU limit of a Kafka broker pod on that node and the pod is under heavy load. I imagine this situation would arise quite often, however this bug exists in our passed releases that have balancing based on CPU enabled. We can get around the issue by disabling CPU goals in the Kafka resource like this:

      cruiseControl:
          config:
            hard.goals: >
              com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal
            default.goals: >
              com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,
              com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal 

      [1] https://github.com/strimzi/strimzi-kafka-operator/issues/5951

      Attachments

        Activity

          People

            pmellor@redhat.com Paul Mellor
            kliberti Kyle Liberti
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: