Loading...

Type: Epic
Resolution: Done
Priority: Critical
Fix Version/s: 2021Q3 Plan, 2021Q4 Plan
Affects Version/s: None
Component/s: None
Labels:
None

Epic Name:
auto-pruning of Operator created objects
Blocked:
False
Ready:
False
Epic Status:
Done
Feature Link:
OCPPLAN-6823 - Make OpenShift Operators more mature
Flagged:

Impediment
Parent Link:
OCPPLAN-6823Make OpenShift Operators more mature
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:
Undefined
Size:
M

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Market:

Epic Goal

SDK supports Operator authors to easily provide strategies in terms of time duration or resource limitation (e.g. storage space, overall etcd object count, or user-defined settings) to cleanup/auto-prune/harvest the objects created by the Operators.

This harvester/cleanup process is part of a larger set of solutions to control bloat, key on very large clusters.
The ideal use case for SDK to support operator authors with a prewritten solution, supporting the SDK value proposition.
Good use of abstraction would create a ‘just works’ feature that also promotes best practices.

Why is this important?

Operators are widely used to control the deployment, configuration, and life cycle management of OpenShift layered products and products from our ecosystem.

In many cases, Operators will be generating resources/objects on the cluster as a way to allow users to later trace/track the work history (e.g. Build history, k8s Jobs, etc). This becomes a problem if the Operators didn’t actively harvest/clean/prune those created objects and those just kept taking up cluster storage.

Currently, it’s either up to Operator authors to come up with their own way to deal with this, or in the worse case, it’s on the cluster admins to figure out when and what Operator-created objects can be safely removed/deleted instead of kept hogging the storage space on the clusters.

Scenarios

Use Case 1:
There is an ETL (extract/transform/load) Operator that has a CR it manages called DataSource. On some frequency files that match that DataSource are created and pushed to a location that the operator will process. Processing of that file will create a ETL Job that runs to process that particular file. The ETL Job runs to completion. The ETL operator might also want to capture the processing log to retain for future reference, but the Job itself would need to be harvested/cleaned up at some point.

Use Case 2:
There is a database operator that has a CR it manages called BackupSchedule. On some frequency the database will create a backup Job that runs to completion. The Job logs are archived into the database itself for future reference. Depending on the customer, the backup Jobs need to be harvested or cleaned from the system.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
An optional solution that Operator SDK users can easily opt-in for their projects:
- SDK generates pre-defined strategy codes as a background cron execution for Operator created objects cleanup/pruning/harvesting
- SDK users could specify a cron expression to control the frequency of Operator created objects cleanup/pruning/harvesting
- SDK users could customize a strategy to control the criteria in terms of resource limitation (e.g. storage space, or overall etcd object count) of Operator created objects cleanup/pruning/harvesting
- SDK users could specify a preDelete hook function to support archival or log processing before the log is deleted from the cluster
- SDK users could specify pod/Job selection criteria using selector patterns or similar that determine what jobs/pods/etc get harvested
- SDK could provide a log of resources that get harvested for auditing purposes (i.e. easily see which deletions are executed by the Operators)
New upstream doc for introducing and guiding how to utilize this new feature in Operator projects.

Dependencies (internal and external)

...

Previous Work (Optional):

POC in operator-lib: https://github.com/operator-framework/operator-lib/pull/68

Open questions::

…

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

relates to

OCPPLAN-7766 Enable easy access to a set of reusable implementations/libraries (customized scaffolding)

Closed

1.	QE Tracker	Closed	Unassigned
2.	TE Tracker	Closed	Unassigned
3.	Docs Tracker	Closed	Unassigned

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates