-
Bug
-
Resolution: Done
-
Major
-
None
-
None
I have recently filed JBEAP-6016 and JBEAP-6017. Although I have closed both of these issues (not bugs), they describe an issue which someone using HA on Azure could face and which could be fairly annoying (and which logs some errors into the log).
When a cluster with AZURE_PING is started, it saves some files into an Azure storage container. If the cluster is shut down properly, the files are removed and everything is OK. But if something bad happens (kill -9 EAP_PID), the files are left behind.
New servers configured to use the same storage container, will see the leftover file and will try to contact the coordinator node from that file. That's OK, since the new node has no way of knowing if the file is still valid or not, but it can slow down the server startup, which could cause issues, if unaccounted for.
The GMS JGroups protocol can be configured to not retry the JOIN as much. This can be configured by adding the max_join_attempts property to the pbcast.GMS protocol in JGroups stack. Its value is the number of JOIN attempts the node should make when contacting the existing coordinator (0 for infinite attempts).
AZURE_PING can also be configured to remove some or all discovery files when coordinator changes. These are configured by setting properties remove_old_coords_on_view_change or remove_all_files_on_view_change to true.
Here's JGroups manual entry that describes both the solutions (Removal of zombie files section): http://www.jgroups.org/manual/index.html#FILE_PING
The linked jiras contain more information, especially JBEAP-6017. Note that if the user makes sure the containers he uses for AZURE_PING are always clean, these settings are not necessary.
- blocks
-
JBEAP-4617 Docs for configuring EAP on Azure
- Closed
- is related to
-
JBEAP-6017 AZURE_PING leaves behind files after its jvm is killed
- Closed
-
JBEAP-6016 AZURE_PING does not remove records of members that have left the cluster
- Closed