Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-529

Improve disaster recovery test coverage for etcd

XMLWordPrintable

    • False
    • OCPSTRAT-16OpenShift - Kubernetes and Core Platform
    • 0% To Do, 0% In Progress, 100% Done
    • 0
    • Program Call

      Goal

      Note: This is an internal improvement. There are no user-facing deliverables.

      There are a few areas to cover for Disaster Recovery (DR):

      • Finish rewriting the existing DR Bash scripts in Go
      • Add guardrails to code that will not allow the customer to cause additional damage to cluster during disaster recovery.
      • Cleanup technical debt from MCO repo and installer.

      Why is this important?

      When a cluster has an event that for example results in quorum loss this is a very stressful situation. If we can provide a very clean solution to this event with well thought out tools the admin will be pleased.

      So we don't run into customer situations like this
      https://docs.google.com/document/d/1ULGQARWdxjujWpSyncY0pKrUG9OcT0PlhEmYVwrPEAE/edit?ts=5eb18ea3

      Scenarios

      1. customer has a cluster event that causes loss of quorum

            wcabanba@redhat.com William Caban
            blomquisg Greg Blomquist
            Dean West
            Ge Liu Ge Liu
            Matthew Werner Matthew Werner
            David Eads David Eads
            Eric Rich Eric Rich
            Votes:
            12 Vote for this issue
            Watchers:
            45 Start watching this issue

              Created:
              Updated:
              Resolved: