-
Bug
-
Resolution: Done
-
Major
-
6.2.0.Final
-
None
-
-
NEW
-
NEW
We encountered soft timeout on threads executing Drools in our production environment.
Below is the trimmed thread dump:
"CacheWorker:2" id=988 State:RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:446)
at java.util.HashMap.containsKey(HashMap.java:434)
at java.util.HashSet.contains(HashSet.java:201)
at org.drools.core.impl.KnowledgeBaseImpl.addEventListener(KnowledgeBaseImpl.java:252)
at org.jbpm.process.instance.ProcessRuntimeImpl.initProcessEventListeners(ProcessRuntimeImpl.java:303)
at org.jbpm.process.instance.ProcessRuntimeImpl.<init>(ProcessRuntimeImpl.java:115)
at org.jbpm.process.instance.ProcessRuntimeFactoryServiceImpl.newProcessRuntime(ProcessRuntimeFactoryServiceImpl.java:10)
at org.jbpm.process.instance.ProcessRuntimeFactoryServiceImpl.newProcessRuntime(ProcessRuntimeFactoryServiceImpl.java:7)
at org.drools.core.runtime.process.ProcessRuntimeFactory.newProcessRuntime(ProcessRuntimeFactory.java:16)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.createProcessRuntime(StatefulKnowledgeSessionImpl.java:757)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.<init>(StatefulKnowledgeSessionImpl.java:393)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.<init>(StatefulKnowledgeSessionImpl.java:286)
at org.drools.core.common.PhreakWorkingMemoryFactory.createWorkingMemory(PhreakWorkingMemoryFactory.java:21)
at org.drools.core.impl.StatelessKnowledgeSessionImpl.newWorkingMemory(StatelessKnowledgeSessionImpl.java:127)
at org.drools.core.impl.StatelessKnowledgeSessionImpl.execute(StatelessKnowledgeSessionImpl.java:302)
Analysis on the Drools code reveals a possible thread safety issue. A single instance of KnowledgeBaseImpl is shared amongst multiple kSessions but inside KnowledgeBaseImpl, it contains a HashSet storing the listeners:
public final Set<KieBaseEventListener> kieBaseListeners = new HashSet<KieBaseEventListener>();
From the thread dump, it hanged at addEventListener method:
public void addEventListener(KieBaseEventListener listener) {
if (!kieBaseListeners.contains(listener))
}
When 2 threads try to put into a hashmap at the same time and both trigger the map to be resized there is a small chance of created a corrupt internal data structure which results in infinite loops. There are a bunch of references on the net for this, here is one example:
Our system heavily rely on Drools and we have high volume everyday, we really need help from Drools dev team and much appreciate if you can provide a patch for us. Thanks in advance.