Uploaded image for project: 'A-MQ Messaging-as-a-Service'
  1. A-MQ Messaging-as-a-Service
  2. ENTMQMAAS-2677

Broker runs out of AIO IO contexts in the kernel at startup

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 1.6.1
    • broker-plugin
    • None
    • False
    • False
    • Undefined

      This problem appears to have begun after an upgrade from AMQ Online 1.6.1 to 1.6.2.

      On many occasions, broker pods are failing to start, with this error message:

      Auto tuning journal ...
      java.lang.RuntimeException: Cannot initialize queue:Resource temporarily unavailable
      	at org.apache.activemq.artemis.nativo.jlibaio.LibaioContext.newContext(Native Method) 

      The "resource" that is exhausted here is AIO IO contexts in the kernel. After a failure, sysctl shows the current settings, which are disturbingly close:

      fs.aio-max-nr = 65536
      sysctl: fs.aio-nr = 65024

      fs.aio-max-nr = 65536
      sysctl: fs.aio-nr = 65024

      However, these values appear in all brokers, whether they are working or not. The value of 65536 is the default in an OpenShift pod, although it is low compared even to a desktop Linux system. Presumably it's low because all pods on a worker not share the same kernel resources.

      The normal use of IO contexts in AMQ 7 is controlled by the setting <journal-max-io>, whose default recently increased from 500 to 4096. However, even the higher value is far smaller than the limit in the pod.

      To be honest, I don't know if this is a problem with AMQ Online, or with the broker. The broker configuration generated by the AMQ Online operators seems perfectly sane but, at the same time, I've never seen this problem in a plain AMQ 7 installation.

       

       

       

       

            Unassigned Unassigned
            rhn-support-kboone Kevin Boone
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: