Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: Major
Fix Version/s: 5.5
Affects Version/s: None
Component/s: Storage
Labels:
- contribution

The S3BinaryStore currently stores extracted text as "user metadata" of S3 objects.
This causes a limitation of maximum 2 KB of available space.
This is IMHO a very low boundary, given that I have lots of documents with an extracted text in the range of 100 KB.

I suggest to switch to storing extracted text as dedicated objects in S3, for example with a /extracted-text key suffix:

...
751f58c75aba8627e4d5b591aa7ceec5413c6a6a
751f58c75aba8627e4d5b591aa7ceec5413c6a6a/extracted-text
77355f3b1916329e7abd7e17987d543a09c36471
77355f3b1916329e7abd7e17987d543a09c36471/extracted-text
77705003b5ea10bc6664af107242af37bfef7115
77705003b5ea10bc6664af107242af37bfef7115
...

I don't foresee any particular issue with the implementation.
There's only the getAllBinaryKeys() method which I don't know how to implement with the current API of the S3 SDK: S3Objects doesn't support setting a "delimiter", which is needed to filter "xxx/extracted-text" entries from the listing.
I've opened a pull request in that regard: https://github.com/aws/aws-sdk-java/pull/1132

Assignee:: Unassigned

Reporter:: Damiano Albani (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2017/05/02 4:05 AM

Updated:: 2017/05/02 4:10 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates