-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
2.5
-
None
Service A starts - becomes coordinator. After it has started properly, Service B is started. As part of the Join, it
1) Gets the members using TCPPING
2) Determines the coordinator
3) Joins
4) Applies view change via the installView. This re-adjusts the members and closes any connections that are no longer members (so the connection to service A is removed).
5) Requests the cache from the coordinator. Service A on response to this tries to send the cache but fails as peer connection has been closed. It tries twice and removes connection. Service B timeout and tries again and this time it is successful. This happens each time. I don't think this should happen - it should return the cache as it knowns where it needs to be sent to.
I have added additional trace statements (these start with APM to show the flow for my understanding. I have deliberately set the get_cache_timeout to a high number to highlight this. I have also provided source and protocol properties in the zip for convenience. There are 2 logs from the run I carried out: cord.log is the coordinator log and cord1.log is second services' log.
Please let me know if there is a work around or a fix I can apply. If I have mis-configured the properties then please advise how to rectify it.
To run the example, run the bat script passing in the service name. Note, the service name needs to be unique as the log name is based on this.