Uploaded image for project: 'Red Hat OpenShift Control Planes'
  1. Red Hat OpenShift Control Planes
  2. CNTRLPLANE-2122

Write an OpenShift Enhancement Proposal for migration and recovery

XMLWordPrintable

    • Strategic Portfolio Work
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      Open an enhancement proposal in https://github.com/openshift/enhancements/ detailing the work needed for migration of encrypted objects as well as recovery in case of loss of key or temporary loss of access to the KMS provider.

      The following was taken from the TP epic's description and can be used as a bases to start the work:

      For an OCP cluster with external KMS enabled:

      • The customer loses the key to the external KMSĀ 
      • The external KMS service is degraded or unavailable
      • The customer misconfigures encryption

      How doe the above scenario(s) impact the cluster? The API may be unavailable

      Goal:

      • Detection: The ability to detect these failure condition(s) and make it visible to the cluster admin.
      • Actuation: To what extent can we restore the cluster? ( API availability, Control Plane operators). Recovering customer data is outside of the scope

      Investigation Steps:

      Detection:

      • How do we detect issues with the external KMS?
      • How do we detect issues with the KMS plugins?
      • How do we surface the information that an issue happened with KMS?
        • Metrics / Alerts? Will not work with SNO
        • ClusterOperatorStatus?

      Actuation:

      • Is the control-plane self-recovering?
      • What actions are required for the user to recover the cluster partially/completely?

      User stories that might result in KCS:

      • KMS / KMS plugin unavailable
        • Is there any degradation? (most likely not with kms v2)
      • KMS unavailable and DEK not in cache anymore
        • Degradation will most likely occur, but what happens when the KMS becomes available again? Is the cluster self-recovering?
      • Key has been deleted and later recovered
        • Is the cluster self-recovering?
      • KMS / KMS plugin misconfigured
        • Is the apiserver rolled-back to the previous healthy revision?
        • Is the misconfiguration properly surfaced?
      • Backup and restore
        • Is there any special procedure with KMS encryption enabled?

      Acceptance Criteria:

      • Document the detection and actuation process in an openshift EP
      • Generate new Jira work items based on the new findings

              Unassigned Unassigned
              dgrisonn@redhat.com Damien Grisonnet
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: