Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2109

Support nodes with a very large number of unordered and uniquely-named children

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Major Major
    • 4.4.0.Final
    • 3.6.0.Final
    • JCR
    • None

      One of the key benefits of ModeShape is the ability to store data in a hierarchy of nodes that can be accessed and modified in a fine-grained manner, still within the context of ACID transactions. The whole repository should be designed with this hierarchy in mind.

      Consider a case where a repository is to store details on a large number of customers. Each customer is represented with a tree of nodes. Naively, all of the customers could be stored under a single parent node, but this starts to perform poorly as the number of customers grows to the 10Ks or more.

      The solution is to not place all the customer nodes under a single parent, but to use a multi-tiered intermediate structure. For example, if each customer were named with a UUID, then the intermediate structure can take advantage of the hexadecimal representation of the UUID:

       + customers
            + <level1>
                 + <level2>
                      + <customerID>
      

      where "<level1>" is named by the first 2 hexadecimal characters of the UUID, "<level2>" is named by the 3-4 hexadecimal characters, and "<customerID>" is named with the full hexadecimal form of the UUID. For example, the node representing the customer with ID "abcdef00-5b8f-11e3-949a-0800200c9a66 would be found at:

        /customers/ab/cd/abcdef00-5b8f-11e3-949a-0800200c9a66
      

      This certainly works, but it requires the application(s) to know and manage this multi-tier intermediate structure. (Note that applications can easily encapsulate the logic of finding or creating a customer node with a given customer ID.)

      ModeShape should support a special type of node that can be used for the "/customers" parent node in the above example, such that:

      • millions of children can be stored directly under the special single parent
      • accessing a child via a path should be fast and independent of the number of existing children
      • adding and removing a child should be fast and independent of the number of existing children
      • iterating through all child nodes should be possible even with limited memory
      • determining the number of children may be expensive
      • accessing the children by JCR ID (e.g., node key) should be as fast as any other node
      • querying the children should be as fast as any other node

      Note: obviously ModeShape may require significant memory for efficiently working with a repository with 10s of millions of nodes, to prevent a lot of churn in the workspace cache.

      However, the solution may also bring other restrictions such as:

      1. the parent must be created as this special kind of collection, perhaps with a primary type that is or extends a special, pre-defined ModeShape-specific node type (e.g., mode:unorderedLargeCollection).
      2. the children will remain unordered
      3. same-name siblings are not allowed: each child's name must be unique and should be determined independently of the number of or names of existing children; see JCR 2.1's new "Node.createChildName( String hint )" method for an easy way to generate unique names.
      4. children may not be renamed

              hchiorean Horia Chiorean (Inactive)
              rhauch Randall Hauch (Inactive)
              Votes:
              22 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: