-
Sub-task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
-
None
There are many failures scenarios with the KMS feature that could lead to disruption.
In the epic will be listed some user stories that would help identify what could potentially happen in a live cluster.
For each of this scenarios we need to figure out how would an operator or a cluster-admin be able to recover the cluster. That would be either a complete or partial recovery.
- Complete recovery => the cluster is back to normal
- Partial recovery => we are able to surface an issue with the KMS to the cluster-admin
It might be simpler to iterate over the user stories in the epic and try to manually recovery first, before looking into this spike.