Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-14899

CacheLoaderInterceptor piggy back leaves orphaned pending operation with availability exception

XMLWordPrintable

      We are using Infinispan with Keycloak and suffer from hanging threads in certain database disconnect/reconnect scenarios.

      I've debugged into the issue and gathered these breadcrumbs:

      1. We are using a JdbcStringBasedStore against a Postgres database.
      2. We take the Postgres database offline (can be simulated easily, see take-store-offline.txt for reference).
      3. Keycloak makes request #0 for key K to the store. This produces a trace log like shown in log-request-0.txt.
      4. We take the Postgres database back online (see take-store-online.txt).
      5. Keycloak makes request #1 for key K to the store. This produces a trace log like shown in log-request-1.txt. Especially interesting is the line containing "Piggybacking". This request will never finish.
      6. Further requests for key K will just piggyback over and over again. These requests will also never finish.

      I think the problem boils down to this chain of events in Infinispan:

      1. Request #0 for key K arrives
        1.  The GetCacheEntryCommand is processed. 
        2. The cache loader interceptor adds the request to its pendingLoads here.
        3. The cache loader interceptor calls loadAndStoreInDataContainer.
        4. ln turn this executes loadFromAllStores.
        5. Within this function checkStoreAvailability is called
        6. This raises a StoreUnavailableException here
        7. The exception travels back up the stack until we are here again.
        8. The call to finishLoadInContext here or here is never scheduled, because the exception is just travelling further up the stack.
        9. Because finishLoadInContext is never executed, the command is never removed from the pendingLoads here.
        10. At this point request #0 finishes and returns an exception back to the calling client.
      2. Request #1 for key K arrives
        1. The command is processed again.
        2. This time pendingLoads actually contains an element, because this was never cleared in request #0.
        3. As a consequence this part of the code is executed.
        4. The piggybacking message will be printed and the piggy backing happens.
      3. Problem happened
        1. The pending load from request #0 will never finish, because the exception happened and pendingLoads was not cleared when it happend.
        2. Request #1 will never finish, because it's just piggybacked on the result from request #0 that never materialize.
        3. On the Keycloak/Infinispan client side threads start to hang indefinitely (timeout 1 day), because they are waiting for any of the commands to finish.
        4. Server runs into complete blockage.

      I am available in Zulip.

        1. log-request-0.txt
          18 kB
        2. log-request-1.txt
          3 kB
        3. take-store-offline.txt
          0.2 kB
        4. take-store-online.txt
          0.1 kB

            wburns@redhat.com Will Burns
            mm-matthias Matthias Kesternich (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: