Uploaded image for project: 'A-MQ Messaging-as-a-Service'
  1. A-MQ Messaging-as-a-Service
  2. ENTMQMAAS-2632

[#5281] Agent utilises unexpectedly high CPU/memory when large numbers of connections/address defined.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 1.7.0
    • None
    • address-controller
    • None

      In a customer case with many thousands of addresses and many thousands of connections/links, we see the Agent CPU continually close to the CPU limit (3000). This is occasionally causing the liveness probe to fail which sometimes leads to a kubernetes restart of the container. This is causing service alerts and additional latency bringing new addresses ready.

      The population of addresses is mostly stable with only occasional creates and deletes. This doesn't seem to correlate with the CPU usage.

      Stats collections look suspicious. Broker and Router stats collections are driven from a JavaScript interval (10000ms - untunable). There's no serialisation that prevents the next stats invocation run starting before the last one has finished. In this customer's case as there are 3 routes and 4 brokers and the results sets will be large, it is easy to imagine that either the broker stats work will exceed > 10 seconds and the router stats work > 10 seconds.

      The some of the processing of broker/outerstat results set is done in a for loop (for all address, for all connections etc). This coding pattern may contribute to blocking the event loop.

      I also notice several unguarded log.debug lines with APs that are computational expensive/garbage creation even though debug is turned off.

      log.debug('syncing broker %s with %j', broker.id, allocated.map(get_address));

      log.debug('[%s] checking addresses, desired=%j, actual=%j => delete %j and create %j', self.id, values(self.addresses).map(address_and_type), values(actual),
      stale.map(address_and_type), missing.map(address_and_type));

              Unassigned Unassigned
              keithbwall Keith Wall
              David Kornel David Kornel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 2 days
                  2d
                  Remaining:
                  Remaining Estimate - 2 days
                  2d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified