Uploaded image for project: 'mod_cluster'
  1. mod_cluster
  2. MODCLUSTER-434

Enable workers to tell the balancer they are overloaded at a certain threshold

    XMLWordPrintable

Details

    • Feature Request
    • Resolution: Unresolved
    • Minor
    • 2.1.0.Final
    • 1.2.10.Final, 1.3.1.Final
    • Core + SPIs
    • None

    Description

      The discussion on Server load threshold, initiated by arquitectura_dgsiaf and godiedelrio_jira, revealed that there might be a need for having a way how one could enable workers to trigger failover at a certain load threshold. For instance, one might want to consider nodes in the cluster desperately overloaded at Load 40 and have failover triggered.

      Formerly, similar behavior could have been achieved with a custom load metric returning Load 0, thus marking the node as "stand by" one. This is no longer possible though: MODCLUSTER-279 mod_cluster returns 503s after STATUS

      I see some viable ways for adding the described logic into the current code base:

      1. Leveraging Load: -1 Worker in Error or Load: 0 Stand-by worker

      We could simply modify DynamicLoadBalanceFactorProvider.java so as it doesn't apply ceiling|floor normalization if load == -1. The question is what shall happen when we want the node back? Would returning Load > 0 make the balancer to change the node's status from NOTOK to OK again even after removal period? I think yes, because the node would keep responding to cping/cpong. This way seems legit, IMHO.

      2. STOP-APP message

      Well, hey, don't we actually have a message for this? What if we allow custom load metric to access mod_cluster subsystem API so as it could send STOP-APP message with Range=NODE? One could send ENABLE-APP as soon as the load settles below the threshold.

      3. An additional threshold attribute

      I don't favor this one, but it's still an option. We might add a configuration attribute threshold to both mod_cluster subsystem and native mod_proxy_cluster code, hence allowing us to set a certain threshold on per-worker basis. Then, in function internal_find_best_byrequests we might access this threshold and decide whether the lbfactor is below it.

      WDYT?

      Attachments

        Activity

          People

            rhn-engineering-rhusar Radoslav Husar
            mbabacek1@redhat.com Michal Karm
            Votes:
            5 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: