Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Core
Labels:

Activity Type:
Product / Portfolio Work
Parent Link:
None
Blocked:
False
Blocked Reason:
None
Ready:
None
Size:
None

Target Version:
None
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:
PX Technical Impact:
PX Impact Range:
None
PX Scheduling Request:
PX Technical Impact Notes:
None

Goal

Note: This is an internal improvement. There are no user-facing deliverables.

There are a few areas to cover for Disaster Recovery (DR):

Finish rewriting the existing DR Bash scripts in Go
Add guardrails to code that will not allow the customer to cause additional damage to cluster during disaster recovery.
Cleanup technical debt from MCO repo and installer.

Why is this important?

When a cluster has an event that for example results in quorum loss this is a very stressful situation. If we can provide a very clean solution to this event with well thought out tools the admin will be pleased.

So we don't run into customer situations like this
https://docs.google.com/document/d/1ULGQARWdxjujWpSyncY0pKrUG9OcT0PlhEmYVwrPEAE/edit?ts=5eb18ea3

Scenarios

customer has a cluster event that causes loss of quorum

incorporates

RFE-1649 Test a supported way to move /var/lib/etcd to a new disk as day 2 task

Closed

is related to

RFE-1287 Provide ability to rollback OpenShift cluster to previous release (Suggest: Automated Etcd Backups/Restores)

Closed

OCPSTRAT-215 [internal] Automated restore of etcd from external target (design)

OCPSTRAT-539 Enhance recovery procedure for full control plane failure

Closed

relates to

API-1376 OpenShift 4.X supports an official process to shut down, restart, and resume an OpenShift cluster from a powered off state, this function should be continuously validated, supported, and guaranteed for consumers for DR and lifecycle use-cases

RFE-3634 Add option --skip-hash-check=true to the ETCD recovery pod

Approved

OCPSTRAT-464 Automated backups of etcd (external targets)

Backlog

OCPSTRAT-403 [Tech Preview] Automated backups of etcd (local destination)

Closed

links to

KCS 6987574: Can an etcd backup be used to restore state on a new cluster?

openshift/openshift-docs#30640: cnf 4.6 references

(3 relates to, 2 links to)

Assignee:: William Caban

Reporter:: Greg Blomquist (Inactive)

Need Info From:: None

Contributors:: Dean West

Architect:: David Eads

QA Contact:: Ge Liu

Doc Contact:: Matthew Werner

Product Operations Engineering Contact:: Eric Rich

Votes:: 12 Vote for this issue

Watchers:: 45 Start watching this issue

Created:: 2019/11/27 7:33 PM

Updated:: 2025/12/26 2:53 PM

Resolved:: 2024/04/04 2:12 AM

Details

Description

Goal

Why is this important?

Scenarios

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates