Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1354

Auto-Recover MicroShift from manual backups

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • 100% To Do, 0% In Progress, 0% Done
    • 0
    • 0

      Feature Overview (aka. Goal Summary)  

      MicroShift can recover from manual backups in case startup fails, e.g. due to corrupt etcd database or human errors.

      Goals (aka. expected user outcomes)

      The goal of this feature is to provide an addition layer of protection and robustness for edge devices, esp. against sudden loss of power or user errors. Users/admin can create manual backups using `microshift backup` (e.g. with a daily cron job). They then can point MicroShift to the folder with those backups. In case of a startup-failure, MicroShift will restore the backups (newest first)  and try to start with that.

      Requirements (aka. Acceptance Criteria):

      1. Provide a way to configure "autoRecover"
      2. AutoRecover must ensure backup fits to version/ostree commit (like we have with backup/restore for updates)
      3. Integrate with existing back/restore logic used during updates
      4. The failing config needs to be backed up, for later post-mortem analysis on why it failed. 
      5. All steps/tries need to be logged very verbose and explicitly, so that the sequence of events can be re-constructed easily. 
      6. Try all available backups from newest first to oldest last.

      Questions to Answer (Optional):

      1.  How to integrate with greenboot / ostree rollbacks? A: Probably not an issue, as rollbacks / restores are triggered only during active upgrade (when a new commit is staged).
      2. Do we try only that latest backup? Or if there are multiple suitable, proceed with older ones? A: Should try to have a list of backups that is workd newest to oldest. 
      3. How to avoid any potential conflicts with the automatic backups created for update/rollbacks A: Probably not an issue, see Q#1 

      Out of Scope

      1. Creating of the backups - only the user/customer knows when there is a good time for this, as microshift needs to be stopped for the backup.
      2. Keep control on how many backups will be kept on disk - that is the duty of the user/customer.
      3. Make the decision on when to trigger a restore of the backup. user needs to provide a script for that (could be re-used from greenboot, but might also be something else). Its the responsibility of the customer to make that decision.   

      Background

      While there are already lots of protection layers (xfs, etcd bbolt backend incl. robustness tests), according to murphy's law, it still will go wrong at some point in time.

      See here for example https://issues.redhat.com/browse/OCPBUGS-28380 

      Customer Considerations

      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

      <your text here>

      Documentation Considerations

      Documentation in the back/restore section needs to be augmented with a new chapter on "auto-recovery"

       

      Interoperability Considerations

      none

       

            dfroehli42rh Daniel Fröhlich
            dfroehli42rh Daniel Fröhlich
            Henry Geay de Montenon, Patryk Matuszak
            Matthew Werner Matthew Werner
            Jeremy Peterson Jeremy Peterson
            Daniel Fröhlich Daniel Fröhlich
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: