Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-14848

Deadlock during expiration of entries with LOCAL cache

XMLWordPrintable

      Problem description

      We are running into a complete deadlock with Infinispan 14.0.9. Any external connection to Infinispan is not processed anymore and just times out.

      The thread dump is attached.

      The important parts are:

      • Three non-blocking threads are blocked while trying to expire an entry (affected threads are non-blocking-thread-p2-t1, non-blocking-threadp2-t2, non-blocking-thread-p2-t3)
      • non-blocking-thread--p2-t4 also tries to expire an entry and holds the lock (0x00000000c5500218) that the other 3 threads are waiting for. 
      • expiration-thread--p5-t1 is some reaper thread for entry expiration and is also completely stuck. The future it waits for is never completed.

      I also have a complete heap dump (available on request, just ask).

      If you look at the stack trace of the reaper thread which is stuck in "deleteFromStoresAndNotify" (https://github.com/infinispan/infinispan/blob/a432d0a95c3c95722f73a56ba7fabbd0b0753b6b/core/src/main/java/org/infinispan/expiration/impl/ExpirationManagerImpl.java#L202) and you dig into the implementation of this you can see it eventually leads to "deleteFromAllStores" (https://github.com/infinispan/infinispan/blob/a432d0a95c3c95722f73a56ba7fabbd0b0753b6b/core/src/main/java/org/infinispan/persistence/manager/PersistenceManagerImpl.java#L735). I was digging a bit through the heap data and found a queue of Futures that should be executed next on the netty thread pool. I found the future from deleteFromAllStores in there. One could see that the JDBC delete was actually executed successful (Future.result = true), but the "thenAccept" part (deleteFromAllStores) was never executed (Future.result = null).

      My guess is that the whole thread pool is already blocked, so the thenAccept Future is not able to run. But because it does not run, all the threads in the pool stay blocked. Thus we are in a classical deadlock situation.

      This analysis is of course based on my very limited knowledge of Infinispan. Something that also sticks out to me is that the non-blocking threads are actually blocking. Is it supposed to do that?

      Reproducer

      We are still working on a reproducer and will post this here once we have something that reliably shows the problem.

      Potentially related issues

      1. ISPN-9798
      2. ISPN-11101
      3. ISPN-5599

      Background/setup

      Our goal is to achieve session persistence within Keycloak. To achieve this goal we configured an external Infinispan server that Keycloak connects to.

      We our using local caches that are backed by a JDBCBasedStringStore like this:

      <local-cache-configuration name="postgres-cache-config" xmlns:jdbc="urn:infinispan:config:store:jdbc:13.0">
          <!-- https://infinispan.org/docs/stable/titles/configuring/configuring.html#tx_configuration -->
          <transaction mode="FULL_XA"/>
          <persistence>
              <jdbc:string-keyed-jdbc-store>
                  <jdbc:data-source jndi-url="jdbc/postgres"/>
                  <jdbc:string-keyed-table drop-on-exit="false" create-on-start="true" prefix="infinispan">
                      <jdbc:id-column name="id" type="VARCHAR"/>
                      <jdbc:data-column name="datum" type="BYTEA"/>
                      <jdbc:timestamp-column name="version" type="BIGINT"/>
                      <jdbc:segment-column name="segment" type="INT"/>
                  </jdbc:string-keyed-table>
              </jdbc:string-keyed-jdbc-store>
          </persistence>
      </local-cache-configuration>
      
      <local-cache name="authenticationSessions" configuration="postgres-cache-config"/> 
      ... more stores of the same kind here ...
      
      <connection-pool initial-size="1"
                       max-size="5"
                       min-size="1"
                       background-validation="1000"
                       idle-removal="1"
                       blocking-timeout="5000"
                       leak-detection="10000"/>

      This setup used to work well with older versions of Infinispan.

      Zulip discussion

      https://infinispan.zulipchat.com/#narrow/stream/118645-infinispan/topic/Deadlock.20during.20expiration.20of.20entries.20with.20JDBCStringBasedSt

      Feel free to reach out directly to me, I am also available on Zulip if that helps.

       

              wburns@redhat.com Will Burns
              mm-matthias Matthias Kesternich (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: