-
Bug
-
Resolution: Done
-
Major
-
13.0.15.Final, 14.0.9.Final
-
None
-
None
We are using Infinispan with Keycloak and suffer from hanging threads in certain database disconnect/reconnect scenarios.
I've debugged into the issue and gathered these breadcrumbs:
- We are using a JdbcStringBasedStore against a Postgres database.
- We take the Postgres database offline (can be simulated easily, see take-store-offline.txt for reference).
- Keycloak makes request #0 for key K to the store. This produces a trace log like shown in log-request-0.txt.
- We take the Postgres database back online (see take-store-online.txt).
- Keycloak makes request #1 for key K to the store. This produces a trace log like shown in log-request-1.txt. Especially interesting is the line containing "Piggybacking". This request will never finish.
- Further requests for key K will just piggyback over and over again. These requests will also never finish.
I think the problem boils down to this chain of events in Infinispan:
- Request #0 for key K arrives
- The GetCacheEntryCommand is processed.
- The cache loader interceptor adds the request to its pendingLoads here.
- The cache loader interceptor calls loadAndStoreInDataContainer.
- ln turn this executes loadFromAllStores.
- Within this function checkStoreAvailability is called.
- This raises a StoreUnavailableException here.
- The exception travels back up the stack until we are here again.
- The call to finishLoadInContext here or here is never scheduled, because the exception is just travelling further up the stack.
- Because finishLoadInContext is never executed, the command is never removed from the pendingLoads here.
- At this point request #0 finishes and returns an exception back to the calling client.
- Request #1 for key K arrives
- The command is processed again.
- This time pendingLoads actually contains an element, because this was never cleared in request #0.
- As a consequence this part of the code is executed.
- The piggybacking message will be printed and the piggy backing happens.
- Problem happened
- The pending load from request #0 will never finish, because the exception happened and pendingLoads was not cleared when it happend.
- Request #1 will never finish, because it's just piggybacked on the result from request #0 that never materialize.
- On the Keycloak/Infinispan client side threads start to hang indefinitely (timeout 1 day), because they are waiting for any of the commands to finish.
- Server runs into complete blockage.
I am available in Zulip.