Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2698

Improve performance of garbage collection for S3 binary storage

    Details

    • Type: Enhancement
    • Status: Pull Request Sent (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 5.5
    • Component/s: Storage
    • Labels:

      Description

      Given that listing all S3 objects in a bucket and checking for an "unused" header is not very efficient, I propose to use another mechanism to regularly clean unused objects.
      It would rely on 2 features provided by S3:

      In practice, such a lifecycle rule would be set-up for/by ModeShape:

      <LifecycleConfiguration>
        <Rule>
          <ID>ModeShape Garbage Collection</ID>
          <Status>Enabled</Status>
          <Filter>
            <Tag>
               <Key>unused</Key>
               <Value>true</Value>
            </Tag>
          </Filter>
        </Rule>
      </LifecycleConfiguration>
      

      The main advantage would be to delegate the clean up process to S3, freeing up ModeShape of iterating over a (possibly) humongous list of objects.
      On the other hand, the interval at which this clean up would take place is handled by S3: in practice, every 24h AFAIK.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dalbani Damiano Albani
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: