Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-529

Improve disaster recovery test coverage for etcd

XMLWordPrintable

    • BU Product Work
    • False
    • 0% To Do, 0% In Progress, 100% Done
    • 0
    • Program Call

      Goal

      Note: This is an internal improvement. There are no user-facing deliverables.

      There are a few areas to cover for Disaster Recovery (DR):

      • Finish rewriting the existing DR Bash scripts in Go
      • Add guardrails to code that will not allow the customer to cause additional damage to cluster during disaster recovery.
      • Cleanup technical debt from MCO repo and installer.

      Why is this important?

      When a cluster has an event that for example results in quorum loss this is a very stressful situation. If we can provide a very clean solution to this event with well thought out tools the admin will be pleased.

      So we don't run into customer situations like this
      https://docs.google.com/document/d/1ULGQARWdxjujWpSyncY0pKrUG9OcT0PlhEmYVwrPEAE/edit?ts=5eb18ea3

      Scenarios

      1. customer has a cluster event that causes loss of quorum

              wcabanba@redhat.com William Caban
              blomquisg Greg Blomquist
              Dean West
              Ge Liu Ge Liu
              Matthew Werner Matthew Werner
              David Eads David Eads
              Eric Rich Eric Rich
              Votes:
              12 Vote for this issue
              Watchers:
              44 Start watching this issue

                Created:
                Updated:
                Resolved: