Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-30137

Ephemeral Cluster Outage Mitigation

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • ephemeral-cluster-outage-mitigation-jan24
    • False
    • Hide

      None

      Show
      None
    • False
    • Unset
    • To Do
    • 5% To Do, 0% In Progress, 95% Done

       

      Summary and goal

      A failed OpenShift automated upgrade rendered the Ephemeral cluster unusable. Devprod Team needs to work together to return to service.

      Acceptance Criteria

      1. Hot Swap to CRCD in order to restore service to our users
      2. Create new Ephemeral Cluster
      3. Ensure we have solid documentation for creating a new Ephemeral cluster
      4. Work with OSD to attempt to return the old Ephemeral cluster to service
      5. Document RCA and Post Mortem after all technical work is complete

       

      Open Questions

      1. Why did this happen?
      2. How can we prevent it in the future?
      3. What larger implications does this event have for business continuity and disaster recovery
      4. Do we have the processes and resources in place to deal with situations like this in the future?

            Unassigned Unassigned
            rh-ee-addrew Adam Drew
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: