-
Bug
-
Resolution: Won't Do
-
Critical
-
None
-
None
-
None
-
RHEL 6.8 64-bit
Crash occurring in mod_cluster when updating works node:
(gdb) bt #0 __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737 #1 0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372 #2 add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875 #3 0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, check=<value optimized out>) at mod_proxy_cluster.c:1061 #4 0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703 #5 0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167 #6 0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242 #7 0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424 #8 0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1632 #9 server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758 #10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829 #11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98 #12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777
rhn-support-hokuda has reviewed in some more detail and I will share his findings:
-------------------------------------------------------------------
(gdb) bt
#0 __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737
#1 0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
#2 add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875
#3 0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0,
check=<value optimized out>) at mod_proxy_cluster.c:1061
#4 0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703
#5 0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167
#6 0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242
#7 0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424
#8 0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>,
s=<value optimized out>) at worker.c:1632
#9 server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758
#10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829
#11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98
#12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777
(gdb) frame
#1 0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
372 if (strncmp(worker->s->name, shared->name, sizeof(worker->s->name))) {
-------------------------------------------------------------------
According to mod_proxy.h:
-------------------------------------------------------------------
/* Runtime worker status informations. Shared in scoreboard */
typedef struct {
char name[PROXY_WORKER_MAX_NAME_SIZE];
...
} proxy_worker_shared;
-------------------------------------------------------------------
therefore, worker->s->name == worker->s.
-------------------------------------------------------------------
(gdb) print worker.s
$7 = (proxy_worker_shared *) 0x7f2a8231c404
(gdb) print *worker.s
Cannot access memory at address 0x7f2a8231c404 <== casued segfault
-------------------------------------------------------------------
worker->s is calculated as follows:
-------------------------------------------------------------------
static apr_status_t create_worker(proxy_server_conf *conf, proxy_balancer *balancer,
server_rec *server,
nodeinfo_t *node, apr_pool_t *pool)
#if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
{
...
char *ptr;
...
ptr = (char *) node;
ptr = ptr + node->offset;
...
worker->s = (proxy_worker_shared *) ptr;
-------------------------------------------------------------------
Let's see 'node':
-------------------------------------------------------------------
(gdb) print node
$3 = (nodeinfo_t *) 0x7f2a41316c48
(gdb) print node.offset
$4 = 1090541500
(gdb) print/x 0x7f2a41316c48 + 1090541500
$6 = 0x7f2a8231c404
(gdb) print worker.s
$7 = (proxy_worker_shared *) 0x7f2a8231c404
-------------------------------------------------------------------
According to node.c, nodeinfo_t.offset is calculated as follows:
-------------------------------------------------------------------
[hokuda@dhcp-193-78 mod_cluster-1.3.1.Final]$ ag offset
native/mod_manager/node.c
107: ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int);
108: ou->offset = APR_ALIGN_DEFAULT(ou->offset);
142: ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int);
143: ou->offset = APR_ALIGN_DEFAULT(ou->offset);
-------------------------------------------------------------------
So, the value of nodeinfo_t.offset should be 320:
-------------------------------------------------------------------
(gdb) print sizeof(nodemess_t)
$16 = 304
(gdb) print sizeof(apr_time_t)
$17 = 8
(gdb) print sizeof(int)
$18 = 4
(gdb) print sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int)
$19 = 316
(gdb) print (316 + 8 - 1) & ~(8-1)
$24 = 320
-------------------------------------------------------------------
But in fact,
-------------------------------------------------------------------
(gdb) print node.offset
$4 = 1090541500
-------------------------------------------------------------------
Additionally, the value 1090541500 is not 8 bit aligned:
-------------------------------------------------------------------
(gdb) print (1090541500 + 8 - 1) & ~(8-1)
$27 = 1090541504
-------------------------------------------------------------------
.....
I noticed that node.offset is a part of the address of midst of the pcre_exec function
(gdb) print node.offset
$4 = 1090541500 (=0x410057bc)
(gdb) x/16 0x7f2a41316c48+316-8-4
0x7f2a41316d78: 0x00000000 0x00000000 0x410057bc 0x00007f2a <==!!!
0x7f2a41316d88: 0x00000000 0x00000000 0x00000000 0x00007f2a
0x7f2a41316d98: 0x00000000 0x00007f2a 0x00000000 0x00000000
0x7f2a41316da8: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) disassemble 0x00007f2a410057bc
Dump of assembler code for function pcre_exec:
...
0x00007f2a410057b7 <+2551>: callq 0x7f2a40ffcfb0 <match>
0x00007f2a410057bc <+2556>: cmp $0xfffffc1b,%eax <=====!!!!
0x00007f2a410057c1 <+2561>: je 0x7f2a4100592d <pcre_exec+2925>
...
Since it is just after callq operation, I think it is relevant with stack (callq pushes the next operation's PC to stack).