The design has been evolving, and I've been pushing (overwriting) new versions of the branch. Here's a summary of the basic design:
The primary goal is to enable storing dynamically-structured values with metadata, and to also enable describing the structure of each value (and metadata) using a schema-based approach. JSON documents provide an excellent way to offer structure that is extremely flexible, while JSON Schema offers a way to define the structure of JSON documents in a way that can be easily validated. (Note that a JSON Schema is just a JSON document that conforms to the JSON meta-schema, which is rich enough to be self-describing. It's actually a very nice specification.)
Manik originally suggested storing the metadata and value (henceforth referred to as 'content') as strings, but doing so would mean that in order to access any information within the metadata or content, the JSON strings would first need to be parsed into an in-memory representation. Plus, if the content is to be modified, the JSON document would need to be modified and written as a string before being stored. This parsing and writing would become prohibitive.
Since Infinispan is essentially an large heap of memory, it makes far more sense to represent the content and metadata as in-memory documents, as long as the in-memory representation were compatible with JSON, were easy to use, and could be validated using JSON Schemas. Additionally, if the representation also supported BSON data types (e.g., binary values, UUIDs, dates, regular expressions, etc.), more types of user-content could be supported (including just raw binary data). These in-memory documents could at any time be read from or written to JSON or BSON formats. Having the schematic values be delta-aware with fine-grained locking (see ISPN-1115) would provide significant advantages w/r/t performance and concurrency. (Note that efficient support for delta-aware means that the schematic value can capture the changes made to the documents by client application and use those changes as the delta, rather than having to compare the changed document to a prior version to compute the changes.)
Using an in-memory representation also means that the content and metadata need not be stored as separate objects, but could instead be represented by a single document that is conceptually:
{
"metadata" : {
}
"content" :
}
This is the approach taken by the current design. The primary packages are:
- org.infinispan.schematic
- org.infinispan.schematic.document
- org.infinispan.schematic.internal.*
The first two packages contain the public API, whereas all implementation-specific classes are contained within the "internal" packages.
The primary API interfaces are:
- SchematicDb - similar to Cache but tailored to make it easy for users to store a content document (or binary value) with a metadata document. Each SchematicDb has a JSON Schema library, and providing a map-reduce-based validation mechanism. Internally this uses a Cache<String,SchematicEntry>.
- SchematicEntry - the value actually stored within Infinispan, and which contains a content object (that is a Document or a Binary value) and a metadata Document. There are methods for getting a mutable interface to the content document and metadata documents. Since tracking the MIME type of the content is likely very common, the SchematicEntry interface provides methods for getting and setting the MIME type (which is actually stored in the metadata.
- Document - an immutable interface to an in-memory document
- EditableDocument - a mutable interface to an in-memory document
- Json - utility class for parsing JSON formatted streams/files into Document instances, and for writing Document instances as JSON
- Bson - utility class for parsing BSON formatted streams/files into Document instances, and for writing Document instances as BSON
- JsonSchema - utility class for working with JSON Schemas
- Various interfaces for reprenting JSON/BSON values: Array, Binary, Symbol, Timestamp, Code, CodeWithScope
The current status is that this works for LOCAL mode, but additional work is required before DISTRIBUTED and REPLICATED modes will work correctly with delta-aware and fine-grained locking.
As always, feedback is appreciated.
Rejecting, since there is no reason for this anymore.