-
Epic
-
Resolution: Unresolved
-
Critical
-
None
-
Team A: Dashboard, devfile and plugin registries, open-vsx, devfile-converter, configbump + traefik, image-puller, server, gateway, authentication, try-in-web-ide action, telemetry, Team B: Dev Spaces operator + chectl/dsc, DevWorkspace + Operator, Web Terminal + Operator, Universal Developer Image, machine-exec, dev environment config
-
False
-
-
False
-
-
In the even of a cluster outage, a significant number of developers will be unable to work.
What is needed is a mechanism to recover a workspace back to a known good state, including any uncommitted changes to the code base or other work.
Describe the solution you'd like
I am building a simple prototype which uses a FROM Scratch container image to store the state of a workspace.
https://github.com/cgruver/workspace-backup-prototype
Prototype Backup -
- A CronJob runs in the Dev Spaces namespace which runs every hour.
- The CronJob looks for dev workspaces which were stopped within the last hour and are currently not running.
- The CronJob creates a Job in the user's namespace which uses Buildah to create a container image with the contents of /projects from the workspace PVC.
- The container image is pushed to an external image registry.
Prototype Restore flow -
- The user logs into a secondary OpenShift cluster that has Dev Spaces installed.
- The user creates a new workspace from the Git URL of the workspace that needs to be restored.
- The user indicates that they wish for the workspace to be restored from a backup. (Right now that is a manual flow using modifications to the Devfile to inject an init container). Desired flow is for a selection in the dashboard to request restore.)
- The workspace is created via the normal flow except that an init-container is run after PVC creation that pulls the backup image and copies the contents to ${PROJECTS_ROOT} before starting the workspace.
I am currently working on extending the prototype to use the internal registry of the secondary OCP cluster in order to manage RBAC on the container images and restrict access to the user who created the original workspace.
Describe alternatives you've considered
PVC snapshots - rejected because of the complexity of managing restore across clusters.
DevWorspace mirroring - rejected because of the complexity of synching data and Custom Resources across clusters.
- is related to
-
CRW-9513 Usage of OpenShift API for Data Protection(OADP) for backup and restore
-
- Open
-
- links to