Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-12714

Cluster locks can stay locked by crashed nodes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • 12.0.0.Final
    • Clustered Locks
    • None
    • Undefined

    Description

      When the node that owns a clustered lock leaves the cluster, ClusteredLockImpl.ClusterChangeListener is supposed to release the lock. But if the org.infinispan.LOCKS cache is in DEGRADED mode, the lock release fails and an error is logged:

      22:01:29,500 ERROR (jgroups-9,Test-NodeD:[]) [CacheManagerNotifierImpl] ISPN000405: Caught exception while invoking a cache manager listener!
      org.infinispan.commons.CacheListenerException: ISPN000280: Caught exception [org.infinispan.partitionhandling.AvailabilityException] while invoking method [public void org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(org.infinispan.notifications.cachemanagerlistener.event.ViewChangedEvent)] on listener instance: org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener@3c91530d
      	at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:430)
      	at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.invoke(AbstractListenerImpl.java:450)
      	at org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListener(CacheManagerNotifierImpl.java:157)
      	at org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.invokeListeners(CacheManagerNotifierImpl.java:84)
      	at org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl.notifyViewChange(CacheManagerNotifierImpl.java:103)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView(JGroupsTransport.java:737)
              ...
      Caused by: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=ConsistentReliabilitySplitBrainTest}' is not available. Not all owners are in this partition
      	at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.doCheck(PartitionHandlingManagerImpl.java:272)
      	at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.checkRead(PartitionHandlingManagerImpl.java:114)
      	at org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:308)
      	at org.infinispan.factories.InternalCacheFactory$PartitionHandlingCache.get(InternalCacheFactory.java:306)
      	at org.infinispan.factories.InternalCacheFactory$AbstractGetAdvancedCache.containsKey(InternalCacheFactory.java:257)
      	at org.infinispan.cache.impl.AbstractDelegatingCache.containsKey(AbstractDelegatingCache.java:384)
      	at org.infinispan.cache.impl.EncoderCache.containsKey(EncoderCache.java:618)
      	at org.infinispan.lock.impl.manager.EmbeddedClusteredLockManager.isDefined(EmbeddedClusteredLockManager.java:157)
      	at org.infinispan.lock.impl.lock.ClusteredLockImpl$ClusterChangeListener.viewChange(ClusteredLockImpl.java:335)
      	at jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
      	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
      	at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.lambda$invoke$1(AbstractListenerImpl.java:424)
      

      When the cache goes back to AVAILABLE mode, there is no other check to see if the lock owner has come back into the cluster or not, so the lock may stay forever owned by a crashed node.

      E.g. the initial cluster is ABCD, D owns clustered lock L

      1. The cluster splits into 3 partitions: AB, C, D
      2. LOCKS cache enters DEGRADED mode
      3. A and B try to unlock L, but fail
      4. D crashes
      5. C merges back with AB
      6. LOCKS cache becomes AVAILABLE
      7. L remains owned by D

      Unlocking the locks on cluster view changes is also problematic. Because the LOCKS cache enters DEGRADED mode after the cluster view change, if the LOCKS cache is distributed, then it is theoretically possible for a lock to be unlocked and then for its owner to merge back:

      E.g. the initial cluster is ABCD, D owns clustered lock L

      1. The cluster splits into 2 partitions: AB and CD
      2. A and B are the 2 owners of L, and A unlocks L
      3. The LOCKS cache enters DEGRADED mode
      4. The partitions merge back
      5. The LOCKS cache becomes AVAILABLE again
      6. D thinks it still owns L, but other nodes are able to acquire it

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: