Uploaded image for project: 'mod_cluster'
  1. mod_cluster
  2. MODCLUSTER-407

worker-timeout can cause httpd thread stalls

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 1.2.9.Final, 1.3.1.Alpha1
    • 1.2.8.Final
    • None
    • None
    • Hide

      1) Configure jboss with worker-timeout="1" in the modcluster subsystem
      2) Start httpd and JBoss. Run httpd on a multicore system (4+ cores).
      3) Confirm JBoss is reachable through httpd/mod_cluster then kill JBoss so the mod_cluster worker-timeout retry logic is used
      4) Load up httpd with highly concurrent request traffic for JBoss for some time.

      Then check for stalled requests/threads. Each request should finish by ~1 second. But this could take minutes once stalled. You can check access logs with %T to check response times once they're done, pstack to check threads, or the mod_status page (it'll show may threads in W state with many seconds since their requests started, which keeps growing)..

      Show
      1) Configure jboss with worker-timeout="1" in the modcluster subsystem 2) Start httpd and JBoss. Run httpd on a multicore system (4+ cores). 3) Confirm JBoss is reachable through httpd/mod_cluster then kill JBoss so the mod_cluster worker-timeout retry logic is used 4) Load up httpd with highly concurrent request traffic for JBoss for some time. Then check for stalled requests/threads. Each request should finish by ~1 second. But this could take minutes once stalled. You can check access logs with %T to check response times once they're done, pstack to check threads, or the mod_status page (it'll show may threads in W state with many seconds since their requests started, which keeps growing)..
    • Hide

      -don't set worker-timeout

      Show
      -don't set worker-timeout

    Description

      Setting a modcluster worker-timeout can stall requests and threads on the httpd side when the requests are received with workers in a down state. A stack of the problem thread looks like the following (recursive loops through mod_proxy_cluster from #160 to #2):

      #0 0x00007ff8eb547533 in select () from /lib64/libc.so.6
      #1 0x00007ff8eba39185 in apr_sleep () from /usr/lib64/libapr-1.so.0
      #2 0x00007ff8e84be0d1 in ?? () from /etc/httpd/modules/mod_proxy_cluster.so
      ...
      #160 0x00007ff8e84beb9f in ?? () from /etc/httpd/modules/mod_proxy_cluster.so
      #161 0x00007ff8e88d2116 in proxy_run_pre_request () from /etc/httpd/modules/mod_proxy.so
      #162 0x00007ff8e88d9186 in ap_proxy_pre_request () from /etc/httpd/modules/mod_proxy.so
      #163 0x00007ff8e88d63c2 in ?? () from /etc/httpd/modules/mod_proxy.so

      Attachments

        Activity

          People

            rhn-engineering-jclere Jean-Frederic Clere
            rhn-support-aogburn Aaron Ogburn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: