Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-3848

Another node leaving may break SingleFileStore

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • RHDG 7.3.7.GA
    • RHDG 7.3.4 GA
    • Clustering
    • None

      When an OutboundTransferTask on node A fails to send state to node B, it cancels itself, but it does not immediately stop iterating over the entries, instead it sets the interrupted flag on the current thread and keeps trying to send chunks:

      2020-05-27 04:59:51,298 DEBUG [org.infinispan.statetransfer.OutboundTransferTask] (infinispan 67656) Cancelling outbound transfer to node jboss90434-22044, segments {0-255}
      ...
      2020-05-27 04:59:54,032 DEBUG [org.infinispan.statetransfer.OutboundTransferTask] (infinispan 67248) Node jboss90434-22044 left cache assignedproductcache while we were sending state to it, cancelling transfer.
      2020-05-27 04:59:54,032 DEBUG [org.infinispan.statetransfer.OutboundTransferTask] (infinispan 67248) Node jboss90434-22044 left cache assignedproductcache while we were sending state to it, cancelling transfer.
      2020-05-27 04:59:54,033 DEBUG [org.infinispan.statetransfer.OutboundTransferTask] (infinispan 67248) Node jboss90434-22044 left cache assignedproductcache while we were sending state to it, cancelling transfer.
      

      If the cache has a SingleFileStore, the iterator will eventually try to obtain new entries from the store, and the FileChannel.read() operation will fail with a ClosedByInterruptException:

      2020-05-27 05:00:30,803 WARN  [org.infinispan.statetransfer.OutboundTransferTask] (infinispan 67248) ISPN000194: Failed loading keys from cache store: org.infinispan.persistence.spi.PersistenceException: java.nio.channels.ClosedByInterruptException
      	at org.infinispan.persistence.file.SingleFileStore._load(SingleFileStore.java:486) [infinispan-core-9.4.18.Final-redhat-00001.jar:9.4.18.Final-redhat-00001]
      	at org.infinispan.persistence.file.SingleFileStore.lambda$publishEntries$3(SingleFileStore.java:545) [infinispan-core-9.4.18.Final-redhat-00001.jar:9.4.18.Final-redhat-00001]
      	at io.reactivex.internal.operators.flowable.FlowableMap$MapConditionalSubscriber.tryOnNext(FlowableMap.java:123) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorConditionalSubscription.slowPath(FlowableFromIterable.java:373) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.subscribers.BasicFuseableConditionalSubscriber.request(BasicFuseableConditionalSubscriber.java:152) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.subscribers.BasicFuseableSubscriber.request(BasicFuseableSubscriber.java:153) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.subscriptions.SubscriptionHelper.setOnce(SubscriptionHelper.java:249) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.BlockingFlowableIterable$BlockingFlowableIterator.onSubscribe(BlockingFlowableIterable.java:128) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.subscribers.BasicFuseableSubscriber.onSubscribe(BasicFuseableSubscriber.java:67) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.subscribers.BasicFuseableConditionalSubscriber.onSubscribe(BasicFuseableConditionalSubscriber.java:66) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.FlowableFromIterable.subscribe(FlowableFromIterable.java:66) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.FlowableFromIterable.subscribeActual(FlowableFromIterable.java:47) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.Flowable.subscribe(Flowable.java:14636) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.FlowableMap.subscribeActual(FlowableMap.java:35) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.Flowable.subscribe(Flowable.java:14636) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.FlowableFilter.subscribeActual(FlowableFilter.java:37) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.Flowable.subscribe(Flowable.java:14636) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.internal.operators.flowable.BlockingFlowableIterable.iterator(BlockingFlowableIterable.java:42) [rxjava-2.2.4.redhat-00005.jar:]
      	at io.reactivex.Flowable.blockingForEach(Flowable.java:5606) [rxjava-2.2.4.redhat-00005.jar:]
      	at org.infinispan.statetransfer.OutboundTransferTask.run(OutboundTransferTask.java:198) [infinispan-core-9.4.18.Final-redhat-00001.jar:9.4.18.Final-redhat-00001]
      Caused by: java.nio.channels.ClosedByInterruptException
      	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) [rt.jar:1.8.0_242]
      	at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:740) [rt.jar:1.8.0_242]
      	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:721) [rt.jar:1.8.0_242]
      	at org.infinispan.persistence.file.SingleFileStore._load(SingleFileStore.java:484) [infinispan-core-9.4.18.Final-redhat-00001.jar:9.4.18.Final-redhat-00001]
      	... 27 more
      

      The ClosedByInterruptException closes SingleFileStore's only FileChannel, and from then on every store operation will fail with a ClosedChannelException:

      Caused by: java.nio.channels.ClosedChannelException
      	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) [rt.jar:1.8.0_242]
      	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:715) [rt.jar:1.8.0_242]
      	at org.infinispan.persistence.file.SingleFileStore._load(SingleFileStore.java:484) [infinispan-core-9.4.18.Final-redhat-00001.jar:9.4.18.Final-redhat-00001]
      

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: