Uploaded image for project: 'OpenShift API Server'
  1. OpenShift API Server
  2. API-1684

[TP] Support KMS on self-managed OCP

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • [TP] Support KMS on self-managed OCP
    • BU Product Work
    • False
    • None
    • False
    • Not Selected
    • To Do
    • OCPSTRAT-108 - [TP] Support Kube KMS Integration in OCP (User-Provided)
    • OCPSTRAT-108[TP] Support Kube KMS Integration in OCP (User-Provided)
    • 33% To Do, 33% In Progress, 33% Done
    • L

      Scenario:

      For an OCP cluster with external KMS enabled:

      • The customer loses the key to the external KMS 
      • The external KMS service is degraded or unavailable

      How doe the above scenario(s) impact the cluster? The API may be unavailable

       

      Goal:

      • Detection: The ability to detect these failure condition(s) and make it visible to the cluster admin.
      • Actuation: To what extent can we restore the cluster? ( API availability, Control Plane operators). Recovering customer data is outside of the scope

       

      Investigation Steps:

      Detection:

      • How do we detect issues with the external KMS?
      • How do we detect issues with the KMS plugins?
      • How do we surface the information that an issue happened with KMS?
        • Metrics / Alerts? Will not work with SNO
        • ClusterOperatorStatus?

      Actuation:

      • Is the control-plane self-recovering?
      • What actions are required for the user to recover the cluster partially/completely?
      • Complete: kube-apiserver? KMS plugin?
      • Partial: kube-apiserver? Etcd? KMS plugin?

      User stories that might result in KCS:

      • KMS / KMS plugin unavailable
        • Is there any degradation? (most likely not with kms v2)
      • KMS unavailable and DEK not in cache anymore
        • Degradation will most likely occur, but what happens when the KMS becomes available again? Is the cluster self-recovering?
      • Key has been deleted and later recovered
        • Is the cluster self-recovering?
      • KMS / KMS plugin misconfigured
        • Is the apiserver rolled-back to the previous healthy revision?
        • Is the misconfiguration properly surfaced?

      Plugins research:

      • What are the pros and cons of managing the plugins ourselves vs leaving that responsibility to the customer?
      • What is the list of KMS we need to support?
      • Do all the KMS plugins we need to use support KMS v2? If not reach out to the provider
      • HSMs?

      POCs:

      Acceptance Criteria:

      • Document the detection and actuation process in a KEP.
      • Generate new Jira work items based on the new findings.

            dgrisonn@redhat.com Damien Grisonnet
            akashem@redhat.com Abu H Kashem
            Ke Wang Ke Wang
            Ramon Acedo Ramon Acedo
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: