Loading...

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: mod_cluster-native
Labels:
None
Environment:

RHEL 6.8 64-bit

CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

httpd 2.4.29 SP1 GA

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Crash occurring in mod_cluster when updating works node:

(gdb) bt
#0  __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737
#1  0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
#2  add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875
#3  0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, check=<value optimized out>) at mod_proxy_cluster.c:1061
#4  0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703
#5  0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167
#6  0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242
#7  0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424
#8  0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1632
#9  server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758
#10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829
#11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98
#12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777

rhn-support-hokuda has reviewed in some more detail and I will share his findings:

-------------------------------------------------------------------
(gdb) bt
#0  __strncmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:1737
#1  0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
#2  add_balancers_workers (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:875
#3  0x00007f2a30f149df in update_workers_node (conf=<value optimized out>, pool=0x7f2a43558b38, server=0x7f2a43564bc0, 
    check=<value optimized out>) at mod_proxy_cluster.c:1061
#4  0x00007f2a30f18c93 in proxy_cluster_child_init (p=0x7f2a434aeb08, s=0x7f2a43564bc0) at mod_proxy_cluster.c:2703
#5  0x00007f2a4147cbac in ap_run_child_init (pchild=0x7f2a434aeb08, s=0x7f2a43564bc0) at config.c:167
#6  0x00007f2a3767d422 in child_main (child_num_arg=0) at worker.c:1242
#7  0x00007f2a3767e5a8 in make_child (s=0x7f2a43564bc0, slot=0) at worker.c:1424
#8  0x00007f2a3767f24c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, 
    s=<value optimized out>) at worker.c:1632
#9  server_main_loop (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1758
#10 worker_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at worker.c:1829
#11 0x00007f2a41462b6e in ap_run_mpm (pconf=0x7f2a4345d138, plog=0x7f2a4348a358, s=0x7f2a43564bc0) at mpm_common.c:98
#12 0x00007f2a4145cc01 in main (argc=7, argv=0x7ffcd8412d78) at main.c:777
(gdb) frame
#1  0x00007f2a30f13b06 in create_worker (node=0x7f2a41316c48, pool=0x7f2a43558b38) at mod_proxy_cluster.c:372
372         if (strncmp(worker->s->name, shared->name, sizeof(worker->s->name))) {
-------------------------------------------------------------------


According to mod_proxy.h:


-------------------------------------------------------------------
/* Runtime worker status informations. Shared in scoreboard */
typedef struct {
    char      name[PROXY_WORKER_MAX_NAME_SIZE];
    ...
} proxy_worker_shared;
-------------------------------------------------------------------


therefore, worker->s->name == worker->s.


-------------------------------------------------------------------
(gdb) print worker.s
$7 = (proxy_worker_shared *) 0x7f2a8231c404
(gdb) print *worker.s
Cannot access memory at address 0x7f2a8231c404 <== casued segfault
-------------------------------------------------------------------


worker->s is calculated as follows:


-------------------------------------------------------------------
static apr_status_t create_worker(proxy_server_conf *conf, proxy_balancer *balancer,
                          server_rec *server,
                          nodeinfo_t *node, apr_pool_t *pool)
#if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
{
    ...
    char *ptr;
    ...
    ptr = (char *) node;
    ptr = ptr + node->offset;
    ...
    worker->s = (proxy_worker_shared *) ptr;
-------------------------------------------------------------------


Let's see 'node':


-------------------------------------------------------------------
(gdb) print node
$3 = (nodeinfo_t *) 0x7f2a41316c48
(gdb) print node.offset
$4 = 1090541500
(gdb) print/x 0x7f2a41316c48 + 1090541500
$6 = 0x7f2a8231c404
(gdb) print worker.s
$7 = (proxy_worker_shared *) 0x7f2a8231c404
-------------------------------------------------------------------


According to node.c, nodeinfo_t.offset is calculated as follows:


-------------------------------------------------------------------
[hokuda@dhcp-193-78 mod_cluster-1.3.1.Final]$ ag offset
native/mod_manager/node.c
107:        ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int);
108:        ou->offset = APR_ALIGN_DEFAULT(ou->offset);
142:    ou->offset = sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int);
143:    ou->offset = APR_ALIGN_DEFAULT(ou->offset);
-------------------------------------------------------------------


So, the value of nodeinfo_t.offset should be 320:


-------------------------------------------------------------------
(gdb) print sizeof(nodemess_t)
$16 = 304
(gdb) print sizeof(apr_time_t)
$17 = 8
(gdb) print sizeof(int)
$18 = 4
(gdb) print sizeof(nodemess_t) + sizeof(apr_time_t) + sizeof(int)
$19 = 316
(gdb) print (316 + 8 - 1) & ~(8-1)
$24 = 320
-------------------------------------------------------------------


But in fact,


-------------------------------------------------------------------
(gdb) print node.offset
$4 = 1090541500
-------------------------------------------------------------------


Additionally, the value 1090541500 is not 8 bit aligned:


-------------------------------------------------------------------
(gdb) print (1090541500 + 8 - 1) & ~(8-1)
$27 = 1090541504
-------------------------------------------------------------------

.....

I noticed that node.offset is a part of the address of midst of the pcre_exec function


(gdb) print node.offset
$4 = 1090541500 (=0x410057bc)

(gdb) x/16  0x7f2a41316c48+316-8-4
0x7f2a41316d78: 0x00000000      0x00000000      0x410057bc      0x00007f2a <==!!!
0x7f2a41316d88: 0x00000000      0x00000000      0x00000000      0x00007f2a
0x7f2a41316d98: 0x00000000      0x00007f2a      0x00000000      0x00000000
0x7f2a41316da8: 0x00000000      0x00000000      0x00000000      0x00000000

(gdb) disassemble 0x00007f2a410057bc                                                                                                    
Dump of assembler code for function pcre_exec:
   ...
   0x00007f2a410057b7 <+2551>:  callq  0x7f2a40ffcfb0 <match>
   0x00007f2a410057bc <+2556>:  cmp    $0xfffffc1b,%eax      <=====!!!!
   0x00007f2a410057c1 <+2561>:  je     0x7f2a4100592d <pcre_exec+2925>
   ...


Since it is just after callq operation, I think it is relevant with stack (callq pushes the next operation's PC to stack).

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates