Modeshape relies on Infinispan to provide clustering and persistence. Nodes are stored in an Infinispan cache which in turn can persisted by configuring one or more cache stores (file, jdbc, etc).
While an infinispan cache can persist in a transaction (local or distributed) it’s underlying cache stores do NOT participate in the transaction.
This is documented in a number of places including:
- the Infinispan documentation http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_cache_loaders_and_transactional_caches
- this discussion on the Infinispan forums https://developer.jboss.org/thread/251870
- this following Infinispan Jira issues https://issues.jboss.org/browse/ISPN-604, https://issues.jboss.org/browse/ISPN-3882
The potential data loss occurs due to the way Infinispan handles transactional caches, this applies to both optimistic and pessimistic locking.
1. Infinispan locks the cache entries being modified. This either happens prior to the prepare stage at the time the lock is requested (pessimistic locking) or during the prepare stage of the transaction (optimistic locking)
2. If using pessimistic locking Infinispan performs a 1 phase commit (as the locks have already been acquired prior to the prepare phase). Failure to acquire a lock results in the transaction being blocked until it times out and which point it is rolled back.
3. If using optimistic locking Infinispan performs 2 phase commit and acquires the locks during the prepare phase. Failure to lock during the prepare phase results in the transaction being rolled back.
4. If all of the locks are successfully acquired Infinispan moves to the commit phase.
5. The transaction is suspended
6. Infinispan first writes to the cache stores during the commit phase. Each modification is written to each configured store in turn, if an error occurs while writing to the cache store a persistence exception is thrown. This results in the commit phase failing. At this point any modifications already made to the store are NOT rolled back but the cache is NOT updated. This results in the cache store and cache having different entries. The transaction is marked as in doubt and needs to be manually recovered (if using XA) otherwise other resources participating in the transaction may be out of sync as well.
7 The transaction is resumed.
8. If the writes to the cache stores are successful the cache is updated and the transaction committed.
If a failure occurs there are several potential problems:
- The cache store and cache are no longer in sync. Initially the application will data will be represented the state of the cache but over time either due to eviction or a restart the data will be represented the state of the store.
- While the failure is logged no error is thrown to the application tier (at least when using Wildfly) because the transaction manager swallows the Infinispan exception. This means that the application is not aware of the issue and returns success. Only when attempting to access the failed data at later stage does the issue become apparent.
There are a number of reasons why Infinispan does this:
- Most caches stores are non transactional
- The node managing the transaction many not be the node responsible for updating the cache store and therefore can not have the cache store participate in the transaction
There is one scenario where it does make sense:
- Replicated cache
- Shared JDBC cache store
In this scenario only one node is ever responsible for updating a cache store and it is the same node that manages the transaction. The store also can participate in the transaction but currently doe not.
The infinispan community is aware of this and is discussing allowing transactional cache stores to participate in transactions when it makes sense (https://developer.jboss.org/thread/251870) but at this point there is not an issue in Jira (that I can find) or a timeline as to when it becomes available.
I am also planning on proposing a modification to the Cache Store SPI which would allow all modifications in a Tx to be be passed to a cache store in a single call, this would give the cache store implementations a better chance of cleaning up after themselves if an error occurs.
Given that this problem is a limitation on Infinispan and unless Modeshape moves away from using Infinispan for persistence there is not much that can currently be done I’ve opened this issue to:
a) make the Modeshape community aware of the problem and track the issue
b) suggest the Modeshape documentation be updated to caution that this issue exists
c) track the progress of updating infinispan to support Tx cache stores so that modeshape can take advantage of the feature if and when it can.