Uploaded image for project: 'JBoss Enterprise Application Platform 6'
  1. JBoss Enterprise Application Platform 6
  2. JBPAPP6-1407

mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup

    Details

      Description

      Jitka Kozana reported she had experienced HTTP 503 errors with mod_cluster & pure IPv6 setup on node shutdown. I have done some investigation and I can reproduce the problem.

      Http client

      I have a curl client issuing requests to [2620:52:0:105f::ffff:c]:8888/SessionTest/hostname periodically, delay being 1 s. Note that there is always a new session for each request (no JSESSIONID stuff anywhere). There are two nodes I switch off and on randomly, always giving enough time so as the starting one may take off safely. After several dozens of this start-ups and shut-downs a 503 error occurs.

      Wed May 30 14:07:15 EDT 2012 [Balancer]:8888 0
      Wed May 30 14:07:16 EDT 2012 [Balancer]:8888 0
      Wed May 30 14:07:17 EDT 2012 HTTP Status 503 - This application is not currently available
      Wed May 30 14:07:18 EDT 2012 [Balancer]:8888 0
      Wed May 30 14:07:19 EDT 2012 [Balancer]:8888 0
      

      please, note the time 14:07:17 as it is very well synchronized with the other node and it will show us what happened.

      Worker nodes

      Node vmg36 [2620:52:0:105f::ffff:3]

      running normally, there were requests served by it before and after the incident.

      14:07:11,920 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:0%2:8009-3) Setting up new session - QpLxi52+8NmJNFMW2YDZm2MT.dc7bd552-a020-3d08-acee-4ae3e0f178a8
      14:07:12,985 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:0%2:8009-1) Setting up new session - 16FUOxeUglbTql5+PzEJSLno.dc7bd552-a020-3d08-acee-4ae3e0f178a8
      14:07:18,187 INFO  [org.jboss.as.clustering] (Incoming-20,null) JBAS010225: New cluster view for partition web (id: 8, delta: -1, merge: false) : [vmg36/web]
      14:07:18,192 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-20,null) ISPN000094: Received new cluster view: [vmg36/web|8] [vmg36/web]
      14:07:18,342 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:0%2:8009-2) Setting up new session - gdiFjJDwfFUSL2vlkqweUGzi.dc7bd552-a020-3d08-acee-4ae3e0f178a8
      

      Node vmg35 [2620:52:0:105f::ffff:c]

      Please, note the moment when ^C was pressed at ^C14:07:17,110:

      14:07:15,113 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:c%2:8009-8) Setting up new session - XGBdEy3g22SQZI0nyOQT7kRF.f49689d6-cdbb-3015-a642-f8200ea456ff
      14:07:16,184 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:c%2:8009-7) Setting up new session - 9l8RklnPOpvG5D2BKRHNjKV2.f49689d6-cdbb-3015-a642-f8200ea456ff
      ^C14:07:17,110 INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-3) Pausing Coyote HTTP/1.1 on http-/2620:52:0:105f:0:0:ffff:c%2:8080
      14:07:17,122 INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-3) Stopping Coyote HTTP/1.1 on http-/2620:52:0:105f:0:0:ffff:c%2:8080
      14:07:17,291 INFO  [org.jboss.as.logging] JBAS011503: Restored bootstrap log handlers
      14:07:17,272 INFO  [org.jboss.modcluster.ModClusterService] All active sessions drained from context [/SessionTest] in 0 seconds
      14:07:17,408 INFO  [org.apache.coyote.ajp.AjpProtocol] Pausing Coyote AJP/1.3 on ajp-/2620:52:0:105f:0:0:ffff:c%2:8009
      14:07:17,421 INFO  [org.apache.coyote.ajp.AjpProtocol] Stopping Coyote AJP/1.3 on ajp-/2620:52:0:105f:0:0:ffff:c%2:8009
      +++
      14:07:17,810 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000029: Passivating all entries to disk
      14:07:17,811 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000030: Passivated 0 entries in 0 milliseconds
      14:07:17,848 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000029: Passivating all entries to disk
      14:07:17,869 INFO  [com.arjuna.ats.jbossatx] ARJUNA032018: Destroying TransactionManagerService
      14:07:17,873 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000030: Passivated 1 entries in 24 milliseconds
      14:07:17,895 INFO  [com.arjuna.ats.jbossatx] ARJUNA032014: Stopping transaction recovery manager
      14:07:17,911 INFO  [org.jboss.as.clustering.infinispan] JBAS010282: Stopped default-host/SessionTest cache from web container
      14:07:17,925 INFO  [org.jboss.as.clustering.infinispan] JBAS010282: Stopped repl cache from web container
      14:07:18,005 INFO  [org.jboss.as.server.deployment] JBAS015877: Stopped deployment SessionTest.war in 960ms
      14:07:18,152 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000082: Stopping the RpcDispatcher
      14:07:18,506 INFO  [org.jboss.as] JBAS015950: JBoss EAP 6.0.0.GA (AS 7.1.2.Final-redhat-1) stopped in 1424ms
      

      HTTPD

      In Access log, there is the 503 (in the middle) as the aforementioned http client has shown already:

      2620:52:0:105f::ffff:3 - - [30/May/2012:14:07:13 -0400] "STATUS / HTTP/1.1" 200 85 "-" "ClusterListener/1.0"
      2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:15 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
      2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:16 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
      2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "DISABLE-APP / HTTP/1.1" 200 - "-" "ClusterListener/1.0"
      2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "STOP-APP / HTTP/1.1" 200 107 "-" "ClusterListener/1.0"
      2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:17 -0400] "GET /SessionTest/hostname HTTP/1.1" 503 1114 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
      2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "REMOVE-APP / HTTP/1.1" 200 - "-" "ClusterListener/1.0"
      2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "REMOVE-APP /* HTTP/1.1" 200 - "-" "ClusterListener/1.0"
      2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:18 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
      2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:19 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
      

      finally, we may observe the exact part of the Error log that differs from the healthy behaviour:

      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [User-Agent] = [curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [Host] = [[2620:52:0:105f::ffff:c]:8888]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Accept] = [*/*]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Via] = [1.1 proxy-01.mw.lab.eng.bos.redhat.com (squid/3.1.10)]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [X-Forwarded-For] = [2620:52:0:105f:5054:ff:fe9e:282e]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[5] [Cache-Control] = [max-age=259200]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[6] [Connection] = [keep-alive]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done
      [Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(265): proxy: APR_BUCKET_IS_EOS
      [Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(270): proxy: data to read (max 8186 at 4)
      [Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans DISABLE-APP (/)
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler DISABLE-APP (/) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff&Alias=default-host%2Clocalhost%2Cexample.com&Context=%2FSessionTest"
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler DISABLE-APP  OK
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans STOP-APP (/)
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler STOP-APP (/) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff&Alias=default-host%2Clocalhost%2Cexample.com&Context=%2FSessionTest"
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler STOP-APP  OK
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 04
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(697): ajp_parse_type: got 04
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 503
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(537): ajp_unmarshal_response: Number of headers is = 2
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(599): ajp_unmarshal_response: Header[0] [Content-Type] = [text/html;charset=utf-8]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(609): ajp_unmarshal_response: ap_set_content_type done
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(599): ajp_unmarshal_response: Header[1] [Content-Length] = [1114]
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 03
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(697): ajp_parse_type: got 03
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 05
      [Wed May 30 14:07:17 2012] [debug] ajp_header.c(697): ajp_parse_type: got 05
      [Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(577): proxy: got response from [2620:52:0:105f::ffff:c]:8009 (2620:52:0:105f:0:0:ffff:c%2)
      [Wed May 30 14:07:17 2012] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (2620:52:0:105f:0:0:ffff:c%2)
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans REMOVE-APP (/)
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler REMOVE-APP (/) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff&Alias=default-host%2Clocalhost%2Cexample.com&Context=%2FSessionTest"
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler REMOVE-APP  OK
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans REMOVE-APP (/*)
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler REMOVE-APP (/NODE_COMMAND) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff"
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(1169): process_node_cmd 4 processing node: 2
      [Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler REMOVE-APP  OK
      [Wed May 30 14:07:17 2012] [debug] mod_proxy_cluster.c(678): update_workers_node starting
      

      It looks like the manager_handler did not make it in time so as to avoid the choice of wrong node.
      Note: Does the fact that nodes are talking via proxy-01.mw.lab.eng.bos.redhat.com (squid/3.1.10) anything to do with the problem on hand?

      Configuration

      Configs are pretty simple, no silly tricks here:

      HTTPD

      <IfModule manager_module>
      Listen [2620:52:0:105f::ffff:c]:8888
      <VirtualHost [2620:52:0:105f::ffff:c]:8888>
         ServerName vmg35
         ServerAlias *.vmg35
        <Directory />
          Order deny,allow
          Deny from all
          Allow from all
        </Directory>
        AdvertiseGroup [ff01::1]:23364 
        ServerAdvertise off
        KeepAliveTimeout 60
        MaxKeepAliveRequests 0
       
        ManagerBalancerName mbabaceksCluster
        AdvertiseFrequency 5
        EnableMCPMReceive
      
        <Location /mcm>
           SetHandler mod_cluster-manager
           Order deny,allow
           Deny from all
           Allow from all
        </Location>
      </VirtualHost>
      </IfModule>
      

      AS7

      <subsystem xmlns="urn:jboss:domain:modcluster:1.1">
        <mod-cluster-config advertise="false" proxy-list="[2620:52:0:105f::ffff:c]:8888" sticky-session="true" sticky-session-force="false" sticky-session-remove="false" connector="ajp" >
          <dynamic-load-provider>
            <load-metric type="busyness"/>
          </dynamic-load-provider>
        </mod-cluster-config>
      </subsystem>
      

      IPv4

      I tried to reproduce this problem on IPv4 with the exactly same instances and configurations (only IP addresses being different) and I was not able to get any 503 errors.
      There are some error-free IPv4 logs attached for comparison:

      • access_log-IPv4-OK.zip
      • error_log-IPv4-OK.zip
      • error_log-IPv6-NOK.zip
      • perf07-Ctrl+C-log-IPv4.zip
      • perf08-Ctrl+C-log-IPv4.zip

        Gliffy Diagrams

          Attachments

          1. access_log-IPv4-OK.zip
            2 kB
          2. error_log-IPv4-OK.zip
            62 kB
          3. error_log-IPv6-NOK.zip
            3 kB
          4. perf07-Ctrl+C-log-IPv4.zip
            15 kB
          5. perf08-Ctrl+C-log-IPv4.zip
            15 kB

            Issue Links

              Activity

                People

                • Assignee:
                  pferraro Paul Ferraro
                  Reporter:
                  mbabacek Michal Karm
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: