-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
12.0.0.Final
-
None
-
Undefined
When the node that owns a clustered lock leaves the cluster, ClusteredLockImpl.ClusterChangeListener is supposed to release the lock. But if the org.infinispan.LOCKS cache is in DEGRADED mode, the lock release fails and an error is logged:
22:01:29,500 ERROR (jgroups-9,Test-NodeD:[]) [CacheManagerNotifierImpl] ISPN000405: Caught exception while invoking a cache manager listener! org.infinispan.commons.CacheListenerException: ISPN000280: Caught exception [org.infinispan.partitionhandling.AvailabilityException] while invoking method [public void org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(org.infinispan.notifications.cachemanagerlistener.event.ViewChangedEvent)] on listener instance: org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener@3c91530d at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:430) at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.invoke(AbstractListenerImpl.java:450) at org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListener(CacheManagerNotifierImpl.java:157) at org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListeners(CacheManagerNotifierImpl.java:84) at org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.notifyViewChange(CacheManagerNotifierImpl.java:103) at org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView(JGroupsTransport.java:737) ... Caused by: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=ConsistentReliabilitySplitBrainTest}' is not available. Not all owners are in this partition at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.doCheck(PartitionHandlingManagerImpl.java:272) at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.checkRead(PartitionHandlingManagerImpl.java:114) at org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:308) at org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:306) at org.infinispan.factories.InternalCacheFactory$AbstractGetAdvancedCache.containsKey(InternalCacheFactory.java:257) at org.infinispan.cache.impl.AbstractDelegatingCache.containsKey(AbstractDelegatingCache.java:384) at org.infinispan.cache.impl.EncoderCache.containsKey(EncoderCache.java:618) at org.infinispan.lock.impl.manager.EmbeddedClusteredLockManager.isDefined(EmbeddedClusteredLockManager.java:157) at org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(ClusteredLockImpl.java:335) at jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:424)
When the cache goes back to AVAILABLE mode, there is no other check to see if the lock owner has come back into the cluster or not, so the lock may stay forever owned by a crashed node.
E.g. the initial cluster is ABCD, D owns clustered lock L
- The cluster splits into 3 partitions: AB, C, D
- LOCKS cache enters DEGRADED mode
- A and B try to unlock L, but fail
- D crashes
- C merges back with AB
- LOCKS cache becomes AVAILABLE
- L remains owned by D
Unlocking the locks on cluster view changes is also problematic. Because the LOCKS cache enters DEGRADED mode after the cluster view change, if the LOCKS cache is distributed, then it is theoretically possible for a lock to be unlocked and then for its owner to merge back:
E.g. the initial cluster is ABCD, D owns clustered lock L
- The cluster splits into 2 partitions: AB and CD
- A and B are the 2 owners of L, and A unlocks L
- The LOCKS cache enters DEGRADED mode
- The partitions merge back
- The LOCKS cache becomes AVAILABLE again
- D thinks it still owns L, but other nodes are able to acquire it
- is related to
-
ISPN-13352 locks are not cleanedup after node leaves
- New