Uploaded image for project: 'Drools'
  1. Drools
  2. DROOLS-6571

DMN evaluation errors when the same FEEL object-access invocation is used by two kie containers

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 7.44.0.Final, 7.57.0.Final
    • dmn engine, kie server
    • None
    • 2021 Week 34-36 (from Aug 23)
    • Hide

      Working on a minimal reproducer, for reference.

      Show
      Working on a minimal reproducer, for reference.
    • Hide

      We managed to mitigate this issue before identifying the root cause by differing the implementation of some of our FEEL expressions that were the same between two kjars that we wanted to run concurrently - though it doesn't get around the same expression being used between two different versions of the same kjar artefact.
      As an example, we used expressions like 'list contains(customBusinessObject.aProperty,"A_STRING")' ,passing in customBusinessObject as a decision input, in two different kjars, so we replaced these expressions in one kjar with 'list contains(aProperty,"A_STRING")', passing in 'aProperty' as a decision input and handling the property access ('customBusinessObject.aProperty') in our business process. This meant that only one of the kjars had to call the cached 'getAProperty()' method on customBusinessObject, and the other did not.

      Show
      We managed to mitigate this issue before identifying the root cause by differing the implementation of some of our FEEL expressions that were the same between two kjars that we wanted to run concurrently - though it doesn't get around the same expression being used between two different versions of the same kjar artefact. As an example, we used expressions like 'list contains(customBusinessObject.aProperty,"A_STRING")' ,passing in customBusinessObject as a decision input, in two different kjars, so we replaced these expressions in one kjar with 'list contains(aProperty,"A_STRING")', passing in 'aProperty' as a decision input and handling the property access ('customBusinessObject.aProperty') in our business process. This meant that only one of the kjars had to call the cached 'getAProperty()' method on customBusinessObject, and the other did not.
    • undefined
    • NEW
    • NEW

      When multiple kie containers (from the same kjar GAV or otherwise) contain DMNs (from the same source .dmn file or otherwise) that evaluate a FEEL expression that includes property access on the same source class, and the respective DMN decisions are evaluated sequentially, the second decision will have 'null' returned from the property access, causing FEEL functions that use that value to fail with "The parameter '<param>', in function <function>, cannot be null". A stack trace is produced in the kie server log, the key parts of which are detailed below.

      2021-08-20 12:44:07,912 ERROR [stderr] (default task-2) java.lang.IllegalArgumentException: object is not an instance of declaring class
       2021-08-20 12:44:07,912 ERROR [stderr] (default task-2) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       2021-08-20 12:44:07,912 ERROR [stderr] (default task-2) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       2021-08-20 12:44:07,912 ERROR [stderr] (default task-2) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       2021-08-20 12:44:07,912 ERROR [stderr] (default task-2) at java.base/java.lang.reflect.Method.invoke(Method.java:566)
       2021-08-20 12:44:07,913 ERROR [stderr] (default task-2) at deployment.kie-server.war//org.kie.dmn.feel.util.EvalHelper.getDefinedValue(EvalHelper.java:415)
       2021-08-20 12:44:07,913 ERROR [stderr] (default task-2) at deployment.kie-server.war//org.kie.dmn.feel.util.EvalHelper.getValue(EvalHelper.java:443)
       2021-08-20 12:44:07,913 ERROR [stderr] (default task-2) at deployment.kie-server.war//org.kie.dmn.feel.lang.ast.QualifiedNameNode.evaluate(QualifiedNameNode.java:70)
       2021-08-20 12:44:07,913 ERROR [stderr] (default task-2) at deployment.kie-server.war//org.kie.dmn.feel.lang.ast.FunctionInvocationNode.lambda$evaluate$0(FunctionInvocationNode.java:82)
       [...7 lines of Java streams frames]
       2021-08-20 12:44:07,913 ERROR [stderr] (default task-2) at deployment.kie-server.war//org.kie.dmn.feel.lang.ast.FunctionInvocationNode.evaluate(FunctionInvocationNode.java:82)
       2021-08-20 12:44:07,913 ERROR [stderr] (default task-2) at deployment.kie-server.war//org.kie.dmn.feel.lang.impl.CompiledExpressionImpl.apply(CompiledExpressionImpl.java:47)
       [...243 more lines]
       

      I've been able to track this down to the accessorCache in EvalHelper, whose scope is across the whole kie server but the methods it caches come from a classloader associated with a given kie container — thus when the second kie container comes to evaluate the FEEL expression, the cache returns a method that comes from the classloader for the first kie container, which is then evaluated against an object whose type come from the second kie container's classloader.

      Here's my more in-depth analysis from our internal documentation of the issue, along with the proposed fix:

      This is where it gets fiddly. From what we can tell, the kie server has its own classloader ('ModuleClassLoader for Module "deployment.kie-server.war" from Service Module Loader'), in which EvalHelper lives, plus a class loader for each kie container (kjar deployment) (note, not for each kjar version - two deployments of the same version get their own classloader). This will be to facilitate hot-loading of containers after the kie server is initialised. Thus, the accessorCache field in EvalHelper is shared between all kie containers, but the classes in the classloader for each kie container are unique — for any given class 'com.example.C' (from here sometimes just C) which is used by both kie containers k~1~, k~2~ (with classloaders c~1~, c~2~ respectively), the representation of class C in c~1~ does not equal the representation of class C in c~2~. So when a process p~1~ in container k~1~ evaluates a feel expression containing an object access expression on an object of type C (say, 'c.property'), the appropriate accessor method m~1~ (representing the method 'com.example.C.getProperty()') is stored in the accessorCache map with key "com.example.C.property". When another process p~2~ in container k~2~ evaluates another feel expression containing the same object access expression on an object of type C (again, 'c.property'), EvalHelper consults its accessorCache for methods with the key of "com.example.C.property", it finds method m~1~ — but the current instance of C (from process p~2~) was instantiated by the copy of C from c~2~. When trying to invoke method m~1~ against an instance of C from c~2~, the exception "java.lang.IllegalArgumentException: object is not an instance of declaring class" results.

      The proposed fix is to change the key of the 'accessorCache' map to include some unique representation of the classloader from which the method was taken—in the above example, the key would be "<classLoader.getClass().getSimpleName()>@<classloader.hashCode()>.com.example.C.property". Then the method m~1~ has key "<c~1~-classname>@<c~1~-hashcode>.com.example.C.property", and the method m~2~ is added with key "<c~2~-classname>@<c~2~-hashcode>.com.example.C.property" and is not confused with method m~1~.

       

      We've successfully applied this fix as a patch to our 7.44.0.Final installation and it resolves the issue, and I'm in the process of raising a PR for the proposed solution outlined above.

              mmortari@redhat.com Matteo Mortari
              luke.armitage@sky.uk Lake Armitage (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: