Uploaded image for project: 'JBoss Core Services'
  1. JBoss Core Services
  2. JBCS-382

mod_cluster segmentation fault when creating worker

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • None
    • mod_cluster-native
    • None

      Crash occurring in mod_cluster when updating works node:

      (gdb) bt
      #0  __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737
      #1  0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
      #2  add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875
      #3  0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, check=<value optimized out>) at mod_proxy_cluster.c:1061
      #4  0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703
      #5  0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167
      #6  0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242
      #7  0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424
      #8  0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1632
      #9  server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758
      #10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829
      #11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98
      #12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777
      

      rhn-support-hokuda has reviewed in some more detail and I will share his findings:

      -------------------------------------------------------------------
      (gdb) bt
      #0  __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737
      #1  0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
      #2  add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875
      #3  0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, 
          check=<value optimized out>) at mod_proxy_cluster.c:1061
      #4  0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703
      #5  0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167
      #6  0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242
      #7  0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424
      #8  0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, 
          s=<value optimized out>) at worker.c:1632
      #9  server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758
      #10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829
      #11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98
      #12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777
      (gdb) frame
      #1  0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
      372         if (strncmp(worker->s->name, shared->name, sizeof(worker->s->name))) {
      -------------------------------------------------------------------
      
      
      According to mod_proxy.h:
      
      
      -------------------------------------------------------------------
      /* Runtime worker status informations. Shared in scoreboard */
      typedef struct {
          char      name[PROXY_WORKER_MAX_NAME_SIZE];
          ...
      } proxy_worker_shared;
      -------------------------------------------------------------------
      
      
      therefore, worker->s->name == worker->s.
      
      
      -------------------------------------------------------------------
      (gdb) print worker.s
      $7 = (proxy_worker_shared *) 0x7f2a8231c404
      (gdb) print *worker.s
      Cannot access memory at address 0x7f2a8231c404 <== casued segfault
      -------------------------------------------------------------------
      
      
      worker->s is calculated as follows:
      
      
      -------------------------------------------------------------------
      static apr_status_t create_worker(proxy_server_conf *conf, proxy_balancer *balancer,
                                server_rec *server,
                                nodeinfo_t *node, apr_pool_t *pool)
      #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
      {
          ...
          char *ptr;
          ...
          ptr = (char *) node;
          ptr = ptr + node->offset;
          ...
          worker->s = (proxy_worker_shared *) ptr;
      -------------------------------------------------------------------
      
      
      Let's see 'node':
      
      
      -------------------------------------------------------------------
      (gdb) print node
      $3 = (nodeinfo_t *) 0x7f2a41316c48
      (gdb) print node.offset
      $4 = 1090541500
      (gdb) print/x 0x7f2a41316c48 + 1090541500
      $6 = 0x7f2a8231c404
      (gdb) print worker.s
      $7 = (proxy_worker_shared *) 0x7f2a8231c404
      -------------------------------------------------------------------
      
      
      According to node.c, nodeinfo_t.offset is calculated as follows:
      
      
      -------------------------------------------------------------------
      [hokuda@dhcp-193-78 mod_cluster-1.3.1.Final]$ ag offset
      native/mod_manager/node.c
      107:        ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int);
      108:        ou->offset = APR_ALIGN_DEFAULT(ou->offset);
      142:    ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int);
      143:    ou->offset = APR_ALIGN_DEFAULT(ou->offset);
      -------------------------------------------------------------------
      
      
      So, the value of nodeinfo_t.offset should be 320:
      
      
      -------------------------------------------------------------------
      (gdb) print sizeof(nodemess_t)
      $16 = 304
      (gdb) print sizeof(apr_time_t)
      $17 = 8
      (gdb) print sizeof(int)
      $18 = 4
      (gdb) print sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int)
      $19 = 316
      (gdb) print (316 + 8 - 1) & ~(8-1)
      $24 = 320
      -------------------------------------------------------------------
      
      
      But in fact,
      
      
      -------------------------------------------------------------------
      (gdb) print node.offset
      $4 = 1090541500
      -------------------------------------------------------------------
      
      
      Additionally, the value 1090541500 is not 8 bit aligned:
      
      
      -------------------------------------------------------------------
      (gdb) print (1090541500 + 8 - 1) & ~(8-1)
      $27 = 1090541504
      -------------------------------------------------------------------
      
      .....
      
      I noticed that node.offset is a part of the address of midst of the pcre_exec function
      
      
      (gdb) print node.offset
      $4 = 1090541500 (=0x410057bc)
      
      (gdb) x/16  0x7f2a41316c48+316-8-4
      0x7f2a41316d78: 0x00000000      0x00000000      0x410057bc      0x00007f2a <==!!!
      0x7f2a41316d88: 0x00000000      0x00000000      0x00000000      0x00007f2a
      0x7f2a41316d98: 0x00000000      0x00007f2a      0x00000000      0x00000000
      0x7f2a41316da8: 0x00000000      0x00000000      0x00000000      0x00000000
      
      (gdb) disassemble 0x00007f2a410057bc                                                                                                    
      Dump of assembler code for function pcre_exec:
         ...
         0x00007f2a410057b7 <+2551>:  callq  0x7f2a40ffcfb0 <match>
         0x00007f2a410057bc <+2556>:  cmp    $0xfffffc1b,%eax      <=====!!!!
         0x00007f2a410057c1 <+2561>:  je     0x7f2a4100592d <pcre_exec+2925>
         ...
      
      
      Since it is just after callq operation, I think it is relevant with stack (callq pushes the next operation's PC to stack).
      

              rhn-engineering-jclere Jean-Frederic Clere
              rhn-support-rbost Robert Bost
              Karm Karm Karm Karm
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: