Uploaded image for project: 'mod_cluster'
  1. mod_cluster
  2. MODCLUSTER-526

SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using LoadBalancingGroup

    XMLWordPrintable

Details

    Description

      Setup

      • 3 tomcats
      • 2 load balancing groups
      • 1 request every 3 seconds (no load at all)
      • shutdown and kill of various nodes
      • no later than third kill/start iteration causes SIGSEGV

      SIGSEGV

          #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
                  /* Here that is tricky the worker needs shared memory but we don't and CONFIG will reset it */
                  helper->index = 0; /* mark it removed */
                  worker->s = helper->shared;
      crash--->   memcpy(worker->s, stat, sizeof(proxy_worker_shared));
          #else
                  worker->id = 0; /* mark it removed */
          #endif
      

      Behavior

       957 helper = (proxy_cluster_helper *) worker->context;
       961 if (helper) {
       962     i = helper->count_active;
       963 }
      
       968 if (i == 0) {
       971    proxy_worker_shared *stat = worker->s;
       972    proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
      

      At this point, helper->shared points to a proxy_worker_shared structure that appears to be properly filled.

       999    if (worker->cp->pool) {
      1000        apr_pool_destroy(worker->cp->pool);
      1001        worker->cp->pool = NULL;
      1002    }
      

      Regardless of the aforementioned block being there or nor (stuffed after 1010),
      helper->shared suddenly points to NULL.

      1008    helper->index = 0;
      1009    worker->s = helper->shared;
      

      Above assignment makes worker->s pointing to NULL.

      1010    memcpy(worker->s, stat, sizeof(proxy_worker_shared));
      

      And here we go

      IMHO, other thread already cleared that memory and nulled the pointer, because it absolutely doesn't happen if
      I run 1 process and 1 thread.

      The workaround that prevents the core looks like this:

      if (helper->shared) {
          worker->s = helper->shared;
          memcpy(worker->s, stat, sizeof(proxy_worker_shared));
      }
      

      How do we fix it?

      Any ideas? rhn-engineering-jclere

      Attachments

        Activity

          People

            mbabacek1@redhat.com Michal Karm
            mbabacek1@redhat.com Michal Karm
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: