Uploaded image for project: 'mod_cluster'
  1. mod_cluster
  2. MODCLUSTER-785

mod_proxy_cluster does not recover http member if websocket is enabled

XMLWordPrintable

    • Hide

      1. start apache with mod_proxy_cluster and websocket enabled
      2. start jboss connected with httpd and an app deployed
      3. start a loop to check HTTP response code

      while true; do curl -s -o /dev/null -w "%{http_code} " http://localhost/app/; date; sleep 1; done
      

      It must print 200 (and date) for all lines
      4. kill Jboss using kill -9 on JVM pid. The loop should then print 503 (and date)
      5. restart JBoss
      6. we expect the loop printing 200 some seconds after JBoss start, but it's 503 forever

      Show
      1. start apache with mod_proxy_cluster and websocket enabled 2. start jboss connected with httpd and an app deployed 3. start a loop to check HTTP response code while true ; do curl -s -o /dev/ null -w "%{http_code} " http: //localhost/app/; date; sleep 1; done It must print 200 (and date) for all lines 4. kill Jboss using kill -9 on JVM pid. The loop should then print 503 (and date) 5. restart JBoss 6. we expect the loop printing 200 some seconds after JBoss start, but it's 503 forever

      mod_proxy_cluster does not recover http member after a 503 error (like connection refusal) if websocket is enabled. It continues to fail with the below as the http worker remains in an error state even though STATUS pings are recovered:

      [Tue Jun 13 16:35:54.094677 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2898): proxy_cluster_trans with cache
      [Tue Jun 13 16:35:54.094727 2023] [authz_core:debug] [pid 140848:tid 140952] mod_authz_core.c(844): [client 127.0.0.1:45398] AH01628: authorization result: granted (no directives)
      [Tue Jun 13 16:35:54.094771 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2025): proxy: byrequests balancer skipping BAD worker
      [Tue Jun 13 16:35:54.094782 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2141): proxy: byrequests balancer DONE (ws://127.0.0.1:8080)
      [Tue Jun 13 16:35:54.094790 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2155): proxy: byrequests balancer Using http://127.0.0.1:8080 instead ws://127.0.0.1:8080
      [Tue Jun 13 16:35:54.094802 2023] [proxy:debug] [pid 140848:tid 140952] proxy_util.c(2119): AH00924: worker http://127.0.0.1:8080 shared already initialized
      [Tue Jun 13 16:35:54.094805 2023] [proxy:debug] [pid 140848:tid 140952] proxy_util.c(2176): AH00926: worker http://127.0.0.1:8080 local already initialized
      [Tue Jun 13 16:35:54.094812 2023] [proxy:debug] [pid 140848:tid 140952] mod_proxy.c(1514): [client 127.0.0.1:45398] AH01143: Running scheme balancer handler (attempt 0)
      [Tue Jun 13 16:35:54.094815 2023] [proxy_fcgi:debug] [pid 140848:tid 140952] mod_proxy_fcgi.c(1056): [client 127.0.0.1:45398] AH01076: url: http://127.0.0.1:8080/sleep/?sleep=1 proxyname: (null) proxyport: 0
      [Tue Jun 13 16:35:54.094819 2023] [proxy_fcgi:debug] [pid 140848:tid 140952] mod_proxy_fcgi.c(1059): [client 127.0.0.1:45398] AH01077: declining URL http://127.0.0.1:8080/sleep/?sleep=1
      [Tue Jun 13 16:35:54.094823 2023] [proxy:debug] [pid 140848:tid 140952] proxy_util.c(2269): AH00933: http: too soon to retry worker for (127.0.0.1)
      [Tue Jun 13 16:35:54.094826 2023] [proxy:error] [pid 140848:tid 140952] AH00940: http: disabled connection for (127.0.0.1)
      [Tue Jun 13 16:35:54.094839 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2025): proxy: byrequests balancer skipping BAD worker
      [Tue Jun 13 16:35:54.094842 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2141): proxy: byrequests balancer DONE (ws://127.0.0.1:8080)
      [Tue Jun 13 16:35:54.094845 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(2155): proxy: byrequests balancer Using http://127.0.0.1:8080 instead ws://127.0.0.1:8080
      [Tue Jun 13 16:35:54.094853 2023] [proxy:debug] [pid 140848:tid 140952] proxy_util.c(2119): AH00924: worker http://127.0.0.1:8080 shared already initialized
      [Tue Jun 13 16:35:54.094855 2023] [proxy:debug] [pid 140848:tid 140952] proxy_util.c(2176): AH00926: worker http://127.0.0.1:8080 local already initialized
      [Tue Jun 13 16:35:54.094858 2023] [proxy:debug] [pid 140848:tid 140952] mod_proxy.c(1514): [client 127.0.0.1:45398] AH01143: Running scheme balancer handler (attempt 1)
      [Tue Jun 13 16:35:54.094861 2023] [proxy_fcgi:debug] [pid 140848:tid 140952] mod_proxy_fcgi.c(1056): [client 127.0.0.1:45398] AH01076: url: http://127.0.0.1:8080/sleep/?sleep=1 proxyname: (null) proxyport: 0
      [Tue Jun 13 16:35:54.094864 2023] [proxy_fcgi:debug] [pid 140848:tid 140952] mod_proxy_fcgi.c(1059): [client 127.0.0.1:45398] AH01077: declining URL http://127.0.0.1:8080/sleep/?sleep=1
      [Tue Jun 13 16:35:54.094867 2023] [proxy:debug] [pid 140848:tid 140952] proxy_util.c(2269): AH00933: http: too soon to retry worker for (127.0.0.1)
      [Tue Jun 13 16:35:54.094870 2023] [proxy:error] [pid 140848:tid 140952] AH00940: http: disabled connection for (127.0.0.1)
      [Tue Jun 13 16:35:54.094877 2023] [:debug] [pid 140848:tid 140952] mod_proxy_cluster.c(3795): proxy_cluster_post_request 503 for (balancer://rect)
      

      If sending no more traffic for 60 seconds, the http worker seems to finally reach the default retry interval and then it can recover:

      [Tue Jun 13 16:43:00.182846 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(2263): AH00932: http: worker for (127.0.0.1) has been marked for retry
      [Tue Jun 13 16:43:00.182852 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(2533): AH00942: http: has acquired connection for (127.0.0.1)
      [Tue Jun 13 16:43:00.182857 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(2588): [client 127.0.0.1:59796] AH00944: connecting http://127.0.0.1:8080/sleep/?sleep=1 to 127.0.0.1:8080
      [Tue Jun 13 16:43:00.182862 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(2814): [client 127.0.0.1:59796] AH00947: connected /sleep/?sleep=1 to 127.0.0.1:8080
      [Tue Jun 13 16:43:00.182950 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(3284): AH02824: http: connection established with 127.0.0.1:8080 (127.0.0.1)
      [Tue Jun 13 16:43:00.182970 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(3468): AH00962: http: connection complete to 127.0.0.1:8080 (127.0.0.1)
      [Tue Jun 13 16:43:01.674403 2023] [proxy:debug] [pid 140849:tid 141052] proxy_util.c(2548): AH00943: http: has released connection for (127.0.0.1)
      

      It looks like the STATUS pings use the ws worker and only recover this one automatically once the JVM is back up:

      [Tue Jun 13 16:35:52.679087 2023] [:debug] [pid 140849:tid 141003] mod_manager.c(2066): Processing STATUS
      [Tue Jun 13 16:35:52.679200 2023] [:debug] [pid 140849:tid 141003] mod_proxy_cluster.c(275): Created: reusing worker for ws://127.0.0.1:8080
      [Tue Jun 13 16:35:52.679223 2023] [proxy:debug] [pid 140849:tid 141003] proxy_util.c(2263): AH00932: ws: worker for (127.0.0.1) has been marked for retry
      

      So we need to recover both the ws and http worker here.

              rhn-engineering-jclere Jean-Frederic Clere
              rhn-support-aogburn Aaron Ogburn
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: