Uploaded image for project: 'Container / Cluster Management (XCM) Strategy'
  1. Container / Cluster Management (XCM) Strategy
  2. XCMSTRAT-785

OSD/ROSA - Verify backup and recovery process

XMLWordPrintable

    • False
    • True
    • 100% To Do, 0% In Progress, 0% Done
    • Undefined
      • This epic is in a Blocked state and is waiting for SDA-6009 to be prioritized in order for SRE-P.
    • 0

      User Story

      Note: updated based on ADR 0015

      Scope: validate process for customers to backup and recover their own clusters, including any potential platform changes required to facilitate the process.

       
      As a Managed OpenShift customer, I want to know how Red Hat recommends that I implement backup and recovery, and I want to ensure that the recovery process is fully tested and supported.

      As an SRE for Managed OpenShift, I want to to be able to quickly and reliably restore a failed cluster in the case of a DR scenario to allow a customer to restore their application backups.

      NOTE this is not SRE taking etcd backups and using it to restore a cluster.  "Restore a failed cluster" for classic is to bring a cluster up with the same domain name and cluster name with no etcd data.

      Acceptance Criteria

      • Empower and enable our customers to be confident in the resiliency of our offering.
      • Meet minimum requirements for compliance and customer adoption
      • Red Hat provides reference architecture for recommended backup configuration
      • Customers bring their own backup solution
      • Customers must be able to succeed with their recovery efforts
      • This needs to be well documented and tested

      Out of Scope

      • We will not become a backup vendor.
      • We will not be responsible for a customer’s applications and data
      • We will not be responsible for recovery efforts that aren't caused by Red Hat

      Default Done Criteria

      • All existing/affected SOPs have been updated.
      • New SOPs have been written.
      • Internal training has been developed and delivered.
      • The feature has both unit and end to end tests passing in all test
        pipelines and through upgrades.
      • If the feature requires QE involvement, QE has signed off.
      • The feature exposes metrics necessary to manage it (VALET/RED).
      • The feature has had a security review.
      • Contract impact assessment.
      • Service Definition is updated if needed.
      • Documentation is complete.
      • Product Manager signed off on staging/beta implementation.

      Dates

      Integration Testing:
      Beta:
      GA:

      Current Status

      GREEN | YELLOW | RED
      GREEN = On track, minimal risk to target date.
      YELLOW = Moderate risk to target date.
      RED = High risk to target date, or blocked and need to highlight potential
      risk to stakeholders.

      References

      Links to Gdocs, github, and any other relevant information about this epic.

            Unassigned Unassigned
            wgordon.openshift Will Gordon
            Aaren de Jong Aaren de Jong
            Will Gordon Will Gordon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: