Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Done
Priority: Major
Fix Version/s: 3.5
Affects Version/s: None
Labels:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

The goal is to reduce the number of discovery calls to a backend store, e.g. for FILE_PING if located on a shared drive (using read and write RPCs) and all cloud store discovery protocols such as S3_PING, RACKSPACE_PING, SWIFT_PING and GOOGLE_PING.

This is mainly done by

Having one file containing the information for all members rather than a single file per member. This means just 1 read instead of N reads (N= cluster size)
Providing a boostrap file containing the UUID, logical name and IP address of all members up front

For example, if we start a cluster of 1000 nodes, the cost of the existing mechanism would be 1 for the first member, 2 for the second etc, for a total of roughly N*N/2 = ~500'000 calls. With the new algorithm, it would be N = 1'000 calls.

Not only does this save money (if a cloud store is used, ingress or egress traffic may be charged for), but also reduces overall latency by making fewer calls.

The members file lists all members in a boostrap file in the following format:

Logical name	UUID	IP address:port	coord
A	1	192.168.1.5:7800	true
B	2	192.168.1.6:7800	false
C	3	192.168.1.7:7800	false

The file could be located on a (shared) file system, S3, a DB table or a cloud store.

This could possibly be an alternative impl of FILE_PING, S3_PING, GOOGLE_PING, SWIFT_PING etc.

On startup, the static discovery protocol reads this file and populates the UUID.cache and TP.logical_addr_cache caches in the transport.

Once this is done, there is no need for lookups as the caches should have the complete information. Note that TP.logical_addr_cache_max_size should be greater than the max number of nodes.

When nodes are started, they need to be given the logical name and UUID indicated in the file. The former can be done via JChannel.name(String name), the latter should be done via an AddressGenerator.

Note that UUIDs cannot be reused, so when a channel is disconnected and subsequently reconnected, the address generator should pick a different UUID (perhaps a random one). This should be reflected in the config file as well.

Also, nodes need to be started in the order in which they are listed. The coordinator to contact for joining the cluster is marked, so ideally only 1 JOIN req-rsp round is needed.

The goals of this protocol are:

Used when IP multicast is not available
Quick startup
Reducing the number of calls to the cloud store (latency!)
- Instead of N calls to the (cloud) store, only 1 call is needed (to read the file)
Large clusters: using N-1 for multicast simulation quickly generates too much traffic in the discovery phase

Coordinator changes

When the coordinator changes, the new coordinator needs to update the file; changing the coordinatorship so nodes started after this will contact the right coordinator.

New members (not listed) join

This could be handled by either changing the bootstrap file manually or dynamically:

The new member reads the file and sends an INFO message with its UUID, logical_name and IP address to all members
- (This is done before sending a JOIN request to the coordinator)
Every member updates their local cache when receiving the INFO message
The coordinator, upon reception of an INFO message, updates the file
- This ensures that only 1 node (the coord) updates the file and prevents corruption of the file through concurrent updates

References

See https://github.com/belaban/JGroups/blob/master/doc/design/CloudBasedDiscovery.txt for the design

is related to

JGRP-1826 Discovery: file-based discovery protocols should not send discovery requests

Resolved

Assignee:: Bela Ban

Reporter:: Bela Ban

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2014/05/22 5:00 AM

Updated:: 2014/06/27 3:31 AM

Resolved:: 2014/06/27 3:31 AM

Details

Description

The goals of this protocol are:

Coordinator changes

New members (not listed) join

References

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates