I have noticed Nullpointer exceptions when I try and reload a persisted session on startup. It is a bit hard to recreate (I am actually not managing it now since I deleted all old sessions to attempt recreation) but I figured maybe someone has an idea. Essentially I am getting this NPE twice:
at org.drools.persistence.jpa.JpaTimerJobInstance.call(JpaTimerJobInstance.java:50) [drools-persistence-jpa-6.3.0.Final.jar:6.3.0.Final]
at org.drools.persistence.jpa.JpaTimerJobInstance.call(JpaTimerJobInstance.java:30) [drools-persistence-jpa-6.3.0.Final.jar:6.3.0.Final]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
When debugging I noticed the following behaviour which points to a race condition:
If I start my server and try to recreate the session, those NPEs happen 2x.
If I start my server and put a breakpoint BEFORE creating the CommandService for execution, I can wait for a few seconds and the service can be found.
It appears that the scheduler's timerJobFactoryManager is not fully there at the time the sessions is being loaded? Or something else is racing with the service creation.
Is there a way for me to make sure everything is fully instantiated before I use drools? Does a workaround exist? Is this even a bug?
Thanks and let me know your thoughts. If I run into this again I will attempt to try and reproduce it. Let me know if more info is needed!
I noticed that the reason I can't reproduce it at the moment is that the JpaTimerJobInstance is never called with the newly persisted session. It appears that that timer job depends on something else to happen so that it thinks it needs to start the timerJob