Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Major
Fix Version/s: openshift-4.18
Affects Version/s: None
Component/s: Microshift
Labels:

Work Type:
BU Product Work
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Hierarchy Progress Bar:

0% To Do, 100% In Progress, 0% Done
Size:
M
Target Version:

openshift-4.18

Risk Score:
0

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Feature Overview (aka. Goal Summary)

MicroShift can recover from manual backups in case startup fails, e.g. due to corrupt etcd database or human errors.

Goals (aka. expected user outcomes)

The goal of this feature is to provide an addition layer of protection and robustness for edge devices, esp. against sudden loss of power or user errors. Users/admin can create manual backups using `microshift backup` (e.g. with a daily cron job). They then can point MicroShift to the folder with those backups. In case of a startup-failure, MicroShift will restore the backups (newest first) and try to start with that.

Requirements (aka. Acceptance Criteria):

Provide a way to configure "autoRecover"
AutoRecover must ensure backup fits to version/ostree commit (like we have with backup/restore for updates)
Integrate with existing back/restore logic used during updates
The failing config needs to be backed up, for later post-mortem analysis on why it failed.
All steps/tries need to be logged very verbose and explicitly, so that the sequence of events can be re-constructed easily.
Try all available backups from newest first to oldest last.

Questions to Answer (Optional):

How to integrate with greenboot / ostree rollbacks? A: Probably not an issue, as rollbacks / restores are triggered only during active upgrade (when a new commit is staged).
Do we try only that latest backup? Or if there are multiple suitable, proceed with older ones? A: Should try to have a list of backups that is workd newest to oldest.
How to avoid any potential conflicts with the automatic backups created for update/rollbacks A: Probably not an issue, see Q#1

Out of Scope

Creating of the backups - only the user/customer knows when there is a good time for this, as microshift needs to be stopped for the backup.
Keep control on how many backups will be kept on disk - that is the duty of the user/customer.
Make the decision on when to trigger a restore of the backup. user needs to provide a script for that (could be re-used from greenboot, but might also be something else). Its the responsibility of the customer to make that decision.

Background

While there are already lots of protection layers (xfs, etcd bbolt backend incl. robustness tests), according to murphy's law, it still will go wrong at some point in time.

See here for example https://issues.redhat.com/browse/OCPBUGS-28380

Customer Considerations

Probably want to review design with key customers.

Documentation Considerations

Documentation in the back/restore section needs to be augmented with a new chapter on "auto-recovery"

Interoperability Considerations

none

links to

demo

design

Assignee:: Daniel Fröhlich

Reporter:: Daniel Fröhlich

Contributors:: Henry Geay de Montenon, Patryk Matuszak

QA Contact:: John George

Doc Contact:: Shauna Diaz

Architect:: Jeremy Peterson

SME:: Patryk Matuszak

Product Manager:: Daniel Fröhlich

Product Operations Engineering Contact:: Jon Thomas

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/04/29 2:44 PM

Updated:: 2024/10/29 7:34 PM

Resolved:: 2024/10/28 8:19 AM

Target end:: 2024/10/31

Details

Description

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates