-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
[TP] Support KMS on self-managed OCP
-
BU Product Work
-
False
-
None
-
False
-
Not Selected
-
To Do
-
OCPSTRAT-108 - [TP] Support Kube KMS Integration in OCP (User-Provided)
-
OCPSTRAT-108[TP] Support Kube KMS Integration in OCP (User-Provided)
-
33% To Do, 33% In Progress, 33% Done
-
L
Scenario:
For an OCP cluster with external KMS enabled:
- The customer loses the key to the external KMS
- The external KMS service is degraded or unavailable
How doe the above scenario(s) impact the cluster? The API may be unavailable
Goal:
- Detection: The ability to detect these failure condition(s) and make it visible to the cluster admin.
- Actuation: To what extent can we restore the cluster? ( API availability, Control Plane operators). Recovering customer data is outside of the scope
Investigation Steps:
Detection:
- How do we detect issues with the external KMS?
- How do we detect issues with the KMS plugins?
- How do we surface the information that an issue happened with KMS?
- Metrics / Alerts? Will not work with SNO
- ClusterOperatorStatus?
Actuation:
- Is the control-plane self-recovering?
- What actions are required for the user to recover the cluster partially/completely?
- Complete: kube-apiserver? KMS plugin?
- Partial: kube-apiserver? Etcd? KMS plugin?
User stories that might result in KCS:
- KMS / KMS plugin unavailable
- Is there any degradation? (most likely not with kms v2)
- KMS unavailable and DEK not in cache anymore
- Degradation will most likely occur, but what happens when the KMS becomes available again? Is the cluster self-recovering?
- Key has been deleted and later recovered
- Is the cluster self-recovering?
- KMS / KMS plugin misconfigured
- Is the apiserver rolled-back to the previous healthy revision?
- Is the misconfiguration properly surfaced?
Plugins research:
- What are the pros and cons of managing the plugins ourselves vs leaving that responsibility to the customer?
- What is the list of KMS we need to support?
- Do all the KMS plugins we need to use support KMS v2? If not reach out to the provider
- HSMs?
POCs:
- Have a running POC of KMS on OCP to iterate over the user stories and start testing things out
- Have a hacked version of o/k with https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3926-handling-undecryptable-resources to be able to easily take actions to fix the clusters as it will be for the customers in 4.17.
Acceptance Criteria:
- Document the detection and actuation process in a KEP.
- Generate new Jira work items based on the new findings.
- depends on
-
AUTH-346 Make it possible to remove resources that cannot be accessed due to encryption issues
- Closed
- is related to
-
OCPSTRAT-108 [TP] Support Kube KMS Integration in OCP (User-Provided)
- In Progress
- links to