Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-403

Automated backups of etcd (local destination)

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • OCPSTRAT-16OpenShift - Kubernetes and Core Platform
    • 0% To Do, 0% In Progress, 100% Done
    • 0

      BU Priority Overview

      Initiative: Improve etcd disaster recovery experience (part1)

      Goals

      The current etcd backup and recovery process is described in our docs https://docs.openshift.com/container-platform/4.12/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html

      The current process leaves up to the cluster-admin to figure out a way to do consistent backups following the documented procedure.

      This feature is part of a progressive delivery to improve the cluster-admin experience for backup and restore of etcd clusters to a healthy state.

      Scope of this feature:

      • etcd quorum loss (2 node failure) on a 3 nodes OCP control plane
      • etcd degradation (1 node failure) on a 3 nodes OCP control plane

      Execution Plans

      • Improve etcd disaster recovery e2e test coverage
      • Design automated backup API. Initial target is local destination
      • Should provide a way (e.g. script or tool) for cluster-admin to validate backup files remains valid over time (e.g. account for disk failures corrupting the backup)
      • Should document updated manual steps to restore from local backup. These steps should be part of the e2e test coverage.
      • Should document manual manual steps to copy backups files to destination outside the cluster. (e.g. ssh copy a cluster admin can use in a CronJob)

            wcabanba@redhat.com William Caban
            wcabanba@redhat.com William Caban
            Wei Sun Wei Sun
            Matthew Werner Matthew Werner
            David Eads David Eads
            Eric Rich Eric Rich
            Votes:
            1 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: