When running a replicated cache and repeatedly calling a cacheable method (using Spring cache abstraction), Infinispan enters an infinite replication loop. This can be confirmed by observing replication counts growing over time, where there are no cache misses.
Caches shouldn't be replicated when there is a cache hit.
- 3 cluster members; asynchronous replication with a replication queue
- a cacheable method is executed repeatedly using 2 different keys
- for some reason, this issue only occurs when using Enum arguments for a cache key; I was not able to replicate this when using int or String types (see com.designamus.infinispan.Main.works())
- the behavior is not deterministic (random), which points to a race condition
- the problem does not seem to be related to the Spring's default cache key generator; I was able to reproduce the same behavior with a custom cache key generator, which was thread-safe
- the cacheable method is executed only twice (once both keys are stored in the cache); subsequent invocations retrieve stored values from the cache; this can be confirmed by inspecting the log file
- the cache doesn't expire and entries are not evicted
- the memory usage grows over time, eventually causing OOM on a heavily loaded system
- since the issue is random in nature it may take a 3-4 attempts to reproduce it; I was successful in reproducing this behavior numerous times
Steps to reproduce:
1. Build a test project (mvn clean compile)
2. Execute /run.sh (this will spawn 3 JVMs)
3. Start JConsole to monitor 3 cluster members (jconsole localhost:17001 localhost:17002 localhost:17003)
4. Monitor "replicationCount" attribute under RpcManager for cache "MyCache" for all JVMs (see /replication-counts.png)
5. Observe that replication counts grow over time
6. Observe that all caches are of size 2 and there are no cache misses (see /cache-statistics.png)
If the issue cannot be reproduced (replication counts stay at the same level):
5. Terminate all 3 JVM processes (as a convenience you can execute /stop.sh)
6. Repeat steps 2 through 5 above
When testing the above scenario using a distributed mode, I observed some other anomalies (i.e. the cacheable method was executed multiple times, as if the value was not there). While this may be related, it deserves a separate JIRA.