Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-3922

Hotrod client should shun nodes after a connection read-timeout for a while because of better performance with replicated-caches


      If a hotrod client access a clustered replicated cache it will use a RoundRobin policy.
      In case of a node failure, i.e. Garbage Collection or network issues, a cache operation will fail with a network timeout.
      As this 'suspend' phase might take a bit before the node come back to work or get's dropped from the cluster, and the cluster-view of the client is updated, it will be used several times due to the round-robin according to the total number of existing cluster nodes.

      In a two node cluster the effect is that each second call will wait for the configured timeout until the data is provided by the existing node. Which is a huge performance drawback.

      To improve the performance in such cases a failed connection should expelled from the round robin for a shunning-period or until a new cluster-view is provided from the server side.
      To improve the behaviour automatically there should be a default timeout according to the default JGroups detection (i.e. 10sec).
      It should be able to customize this time by a hotrod property.

            Unassigned Unassigned
            rhn-support-wfink Wolf Fink
            0 Vote for this issue
            1 Start watching this issue