Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2698

Improve performance of garbage collection for S3 binary storage

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: Major Major
    • 5.5
    • None
    • Storage

      Given that listing all S3 objects in a bucket and checking for an "unused" header is not very efficient, I propose to use another mechanism to regularly clean unused objects.
      It would rely on 2 features provided by S3:

      In practice, such a lifecycle rule would be set-up for/by ModeShape:

      <LifecycleConfiguration>
        <Rule>
          <ID>ModeShape Garbage Collection</ID>
          <Status>Enabled</Status>
          <Filter>
            <Tag>
               <Key>unused</Key>
               <Value>true</Value>
            </Tag>
          </Filter>
        </Rule>
      </LifecycleConfiguration>
      

      The main advantage would be to delegate the clean up process to S3, freeing up ModeShape of iterating over a (possibly) humongous list of objects.
      On the other hand, the interval at which this clean up would take place is handled by S3: in practice, every 24h AFAIK.

              Unassigned Unassigned
              dalbani Damiano Albani (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: