-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
ephemeral-cluster-outage-mitigation-jan24
-
False
-
-
False
-
Unset
-
To Do
-
5% To Do, 0% In Progress, 95% Done
-
-
Summary and goal
A failed OpenShift automated upgrade rendered the Ephemeral cluster unusable. Devprod Team needs to work together to return to service.
Acceptance Criteria
- Hot Swap to CRCD in order to restore service to our users
- Create new Ephemeral Cluster
- Ensure we have solid documentation for creating a new Ephemeral cluster
- Work with OSD to attempt to return the old Ephemeral cluster to service
- Document RCA and Post Mortem after all technical work is complete
Open Questions
- Why did this happen?
- How can we prevent it in the future?
- What larger implications does this event have for business continuity and disaster recovery
- Do we have the processes and resources in place to deal with situations like this in the future?