Since its inception Infinispan has been using JBoss Marshalling to deal with all the marshalling needs. With some tweaking (e.g. hooking a custom ObjectTable instance), the JBoss Marshalling based Infinispan externalizer layer is able to produce tiny binary payloads but it has some problems partly due to JBoss Marshalling itself and partly due to our own implementation details:
JBoss Marshalling's objective has always been to try to produce a binary format that passes Java specification, but this is not a requirement for Infinispan. In fact, to reduce the payload size, Infinispan hooks at the ObjectTable level to produce minimal payload sizes.
On top of the mismatch problems mentioned above, JBoss Marshalling’s programming model is based around creating a marshaller, writing to it, and then finishing using it by discarding its context (same applies to unmarshalling). The problem here is two-fold:
- Both marshaller and unmarshaller are quite heavy objects, keeping context information such as references to instances appearing multiple times...etc, so constantly creating them is costly. So, to avoid wasting resources, we ended up adding thread locals that keep a number of marshaller/unmarshaller instances per thread (see
ISPN-1815). These thread locals can potentially affect memory space (see user dev post).
- The second problem is the need to support reentrant marshalling calls when storing data in binary format. The need for reentrancy appears in situations like this: Imagine you have to marshall a PutKV command, so you start a marshaller and write some stuff. Then, you have store the key and value, but these are binary so they have to be transformed into binary format, so again a marshaller needs to be created and key/value information written, finish with the marshaller and then write the bytes in the command itself. So, there needs to be a way to start two marshallers without having finished the first one. This is the reason why the changes added in
ISPN-1815resulted in the thread local keeping a number of marshaller/unmarshaller instances rather than a single one.
Finally, for inter-node cluster communication and storing data in persistence layer, Infinispan is using JBoss Marshalling for both marshalling the types it knows about, e.g. internal data types, and types it does not know about, e.g. key and value types. This means that even if the marshaller is configurable, it’s not easy to switch to a different marshaller (see here for an example where we try to use a different marshaller). This problem is not present in Hot Rod Java clients since there JBoss Marshalling is purely used to marshall keys and values, so it’s very easy to test out a different marshaller.
With all this in mind, the following change recommendations can be made:
- For those types that we know about, marshall those manually in the most compact way possible. JBoss Marshalling codebase does a lot of these for encoding basic types (e.g. Strings, numbers)...etc, so we should be able to reuse them.
- Only rely on 3rd party marshalling libraries for types we don’t know about, e.g. key and value types (If these key/value types happen to be primitives, or primitive derivations (e.g. arrays), we should be able to optimise those too. So, you only rely on 3rd party marshalling libraries for custom unknown types.). The benefit here is the we decouple Infinispan from using JBoss Marshalling all over the place, making it easier to try different marshalling mechanisms.
- With JBoss Marshalling only used for unknown custom types, if the JBoss Marshalling marshaller implementation wants to use thread locals, that's fine, but then we effectively get rid of them except for custom types when JBoss Marshalling marshaller is used, plus we can switch/try different 3rd party marshallers which might be better suited.