-
Bug
-
Resolution: Won't Do
-
Critical
-
None
-
None
-
None
Crash occurring in mod_cluster when updating works node:
(gdb) bt #0 __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737 #1 0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372 #2 add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875 #3 0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, check=<value optimized out>) at mod_proxy_cluster.c:1061 #4 0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703 #5 0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167 #6 0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242 #7 0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424 #8 0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1632 #9 server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758 #10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829 #11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98 #12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777
rhn-support-hokuda has reviewed in some more detail and I will share his findings:
------------------------------------------------------------------- (gdb) bt #0 __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737 #1 0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372 #2 add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875 #3 0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, check=<value optimized out>) at mod_proxy_cluster.c:1061 #4 0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703 #5 0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167 #6 0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242 #7 0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424 #8 0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1632 #9 server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758 #10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829 #11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98 #12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777 (gdb) frame #1 0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372 372 if (strncmp(worker->s->name, shared->name, sizeof(worker->s->name))) { ------------------------------------------------------------------- According to mod_proxy.h: ------------------------------------------------------------------- /* Runtime worker status informations. Shared in scoreboard */ typedef struct { char name[PROXY_WORKER_MAX_NAME_SIZE]; ... } proxy_worker_shared; ------------------------------------------------------------------- therefore, worker->s->name == worker->s. ------------------------------------------------------------------- (gdb) print worker.s $7 = (proxy_worker_shared *) 0x7f2a8231c404 (gdb) print *worker.s Cannot access memory at address 0x7f2a8231c404 <== casued segfault ------------------------------------------------------------------- worker->s is calculated as follows: ------------------------------------------------------------------- static apr_status_t create_worker(proxy_server_conf *conf, proxy_balancer *balancer, server_rec *server, nodeinfo_t *node, apr_pool_t *pool) #if AP_MODULE_MAGIC_AT_LEAST(20101223,1) { ... char *ptr; ... ptr = (char *) node; ptr = ptr + node->offset; ... worker->s = (proxy_worker_shared *) ptr; ------------------------------------------------------------------- Let's see 'node': ------------------------------------------------------------------- (gdb) print node $3 = (nodeinfo_t *) 0x7f2a41316c48 (gdb) print node.offset $4 = 1090541500 (gdb) print/x 0x7f2a41316c48 + 1090541500 $6 = 0x7f2a8231c404 (gdb) print worker.s $7 = (proxy_worker_shared *) 0x7f2a8231c404 ------------------------------------------------------------------- According to node.c, nodeinfo_t.offset is calculated as follows: ------------------------------------------------------------------- [hokuda@dhcp-193-78 mod_cluster-1.3.1.Final]$ ag offset native/mod_manager/node.c 107: ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int); 108: ou->offset = APR_ALIGN_DEFAULT(ou->offset); 142: ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int); 143: ou->offset = APR_ALIGN_DEFAULT(ou->offset); ------------------------------------------------------------------- So, the value of nodeinfo_t.offset should be 320: ------------------------------------------------------------------- (gdb) print sizeof(nodemess_t) $16 = 304 (gdb) print sizeof(apr_time_t) $17 = 8 (gdb) print sizeof(int) $18 = 4 (gdb) print sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int) $19 = 316 (gdb) print (316 + 8 - 1) & ~(8-1) $24 = 320 ------------------------------------------------------------------- But in fact, ------------------------------------------------------------------- (gdb) print node.offset $4 = 1090541500 ------------------------------------------------------------------- Additionally, the value 1090541500 is not 8 bit aligned: ------------------------------------------------------------------- (gdb) print (1090541500 + 8 - 1) & ~(8-1) $27 = 1090541504 ------------------------------------------------------------------- ..... I noticed that node.offset is a part of the address of midst of the pcre_exec function (gdb) print node.offset $4 = 1090541500 (=0x410057bc) (gdb) x/16 0x7f2a41316c48+316-8-4 0x7f2a41316d78: 0x00000000 0x00000000 0x410057bc 0x00007f2a <==!!! 0x7f2a41316d88: 0x00000000 0x00000000 0x00000000 0x00007f2a 0x7f2a41316d98: 0x00000000 0x00007f2a 0x00000000 0x00000000 0x7f2a41316da8: 0x00000000 0x00000000 0x00000000 0x00000000 (gdb) disassemble 0x00007f2a410057bc Dump of assembler code for function pcre_exec: ... 0x00007f2a410057b7 <+2551>: callq 0x7f2a40ffcfb0 <match> 0x00007f2a410057bc <+2556>: cmp $0xfffffc1b,%eax <=====!!!! 0x00007f2a410057c1 <+2561>: je 0x7f2a4100592d <pcre_exec+2925> ... Since it is just after callq operation, I think it is relevant with stack (callq pushes the next operation's PC to stack).