Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1501

Tokenized Auth Enablement for OLM-managed Operators on GCP

XMLWordPrintable

    • 25% To Do, 25% In Progress, 50% Done
    • False
    • Hide

      None

      Show
      None

      Goal: Managed OpenShift consumers (e.g. ROSA, ARO, OSD) and self-managed OpenShift customers on cloud providers (AWS, Azure or GCP) can rely on layered product operators in the cluster to leverage short lived authentication token services (e.g. AWS STS, Azure managed identity, GCP workload identity) to authenticate with the cloud provider's API. The configuration experience to get an operator to do this is streamlined and standardized across all operators.

      Background: When components on OpenShift communicate with cloud provider APIs, they need to authenticate first. This can be done in two ways: a static and long-lived set of credentials generated specifically for the workload ahead of time, or the workload getting permissions to assume a role / policy generated ahead of time, requesting temporary authentication tokens at runtime that are refreshed regularly. In both cases the credentials are associated with the required permissions the workload needs on the cloud provider API. OpenShift core platform has adopted support for this on AWS with OCPPLAN-5656 and is adding support for this on Azure with OCPBU-8. OLM-managed operators have so far not seen a structured approach to enablement and configuration.

      Benefit for customers: Short-lived token authentication is considered more secure because in the event of the token accidentally leaking, the window in which it can be used for exploits is small. In this case, no reconfiguration of the workload is required to get a new token. Whereas with static, long-lived tokens, exploitation can happen over a potentially long period of time if the leakage goes undetected. Changing long-lived tokens when the leakage / exploitation is detected also causes administrative overhead. An OpenShift deployment where both the core platform and optional OLM-managed operators exclusively use short-lived tokens for API authentication is considered part of a strong security posture.

      Why is this important now: Customers increasingly prefer this short-lived token authentication for API access in their cloud accounts and start to enforce it via policies. A workload / product that doesn't support this typically requires an exception given by customers InfoSec team. Currently the support for this authentication method is fragmented across the layered product portfolio and the catalog of optional, OLM-managed operators. Only a subset support some form of short-lived token usage, but only for a specific cloud provider. The configuration experience to enable this varies between operators and is typically manual. If left unchanged, it could be seen as an adoption barrier. Additionally, Hypershift-based deployments are going to use short-lived token authentication exclusively and will have no first-class support for operators, that do not support it.

      Outcomes:

      • As a customer of OpenShift layered products, I need to be able to fluidly, reliably and consistently install and use OpenShift layered product Kubernetes Operators into my cluster clusters, while leveraging short-lived token authentication throughout my deployment.
      • As a customer of OpenShift on GCP, overall I expect OpenShift as a platform to function equally well with tokenized authentication as it does with static, long-lived credentials. I expect the same from the Kubernetes Operators under the Red Hat brand (that need to reach cloud APIs) in that tokenized workflows are equally integrated and workable.
      • As the managed services, offering a downstream opinionated, supported and managed lifecycle of OpenShift (in the forms of OSD on GCP, the OpenShift platform should have as close as possible, native integration with core platform operators when clusters use tokenized cloud auth, driving the use of layered products.

      Current Situation:

      • OLM-managed operators today are unable to request cloud credentials via OpenShifts Credential API when installed on a cluster with short-lived authentication enabled (currently GCP Workload Identity, see {}CCO-114 for GCP WI related work). The CloudCredentialOperator component that would be used for this has currently no support for OLM-managed operators.
      • On managed Openshift Dedicated (OSD), enabling one of the OLM-managed operators when it is deployed in a managed cluster, customers are required to register the operator with OCM (OpenShift Cluster Manager) before installing it via OperatorHub or directly on the cluster.
        • Users are unaware of which operators request credentials
        • Users are not warned that operator installation will fail 
        • Users subsequently are unaware why the installation failed
        • Operators timeout waiting for credentials
        • Users should be informed of steps required for a successful installation
      • Any configuration for short-lived token support for OLM-managed operator installation is currently command-line only

       

      Execution Plan:

      Some of the below workstreams will be running in parallel. Proper product documentation and QE is part of all of them.

      Workstream 5 - OCP Console support for short-lived authentication flows (OCPSTRAT-962); to be tackled with OCPBUGS-33756

      • based on Workstream 9, the installation and configuration experience for any OLM-managed operator using short-lived token authentication is streamlined using the OCP console
      • the OCP console helps in detecting when the cluster itself is already using short-lived tokens for core functionality
      • the OCP console helps discover operators capable of short-lived token authentication and their IAM permission requirements

      Workstream 8 - Standardized update flow for OLM-managed operators leveraging short-lived token authentication (OCPSTRAT-95)

      • A standardized flow is derived to warn the user if during or after an update the required IAM permissions of an OLM-managed operator configured with short-lived token authentication change
      • this flow is implemented in the OCP console as well as in the OLM APIs

      Workstream 9 - CloudCredentialOperator-based flow for OLM-managed operators and GCP WIF (OCPSTRAT-922)

      • CCO gets a new mode in which it can reconcile a GCP credential request for OLM-managed operators
      • A standardized flow is leveraged to guide users in preparing their GCP IAM policies and roles with permissions that are required for OLM-managed operators 
      • A standardized flow is defined in which users can configure OLM-managed operators to leverage GCP WIF
      • The OCP console will be enabled to support this

      Workstream 10 - GCP WIF enablement for OLM-managed operators (OCPSTRAT-1377)

      • Short-lived token authentication using GCP WI for: OADP, Cluster Logging, GCP PD CSI operator, Filestore CSI operator (as applicable), OpenShift Pipelines, Red Hat Quay, ODF, Ansible Automation Platform
      • these operators only support this flow on OCP 4.17 or newer
      •  

       

      Open Questions:

       

      Definition of done:

      1. Main success scenario - high-level user story (using STS as an example)
        1. customer creates an OpenShift cluster with GCP Workload Identity (WI)
        2. customer wants basic (table-stakes) features such as GCP PD, OADP or Logging
        3. customer discovers the cluster is WI mode and the desired operators are WI-capable
        4. customer sees necessary tasks for preparing for the operator in OperatorHub from their cluster
        5. customer prepares GCP IAM roles/policies in anticipation of the Operator they want, using what they get from OperatorHub
        6. customer's provides a very minimal set of parameters to the Operator's OperatorHub page
        7. The cluster can automatically setup the Operator, using the provided tokenized credentials and the Operator functions as expected
        8. Cluster and Operator upgrades are taken into account and automated
      2. Managed OpenShift scenarios - high-level user story
        1. The same as above, but the OSD CLI would assist with OSD role/policy creation
        2. The same as above, but the oc CLI would assist with cloud role/policy management (per respective cloud provider for the cluster)

      Desired effect:

      • Growth is the acquisition of net new usage of the platform. This can be new workloads not previously able to be supported, new markets not previously considered, or new end users not previously served.
      • Retention is maintaining and expanding existing use of the platform. This can be more effective use of tools, competitive pressures, and ease of use improvements.
      • Both of growth and retention are the effect of this effort.
        • Customers have strict requirements around using only token-based cloud credential systems for workloads in their cloud accounts, which include OpenShift clusters in all forms.
          • We gain new customers from both those that have waited for token-based auth/auth from OpenShift and from those that are new to OpenShift, with strict requirements around cloud account access
          • We retain customers that are going thru both cloud-native and hybrid-cloud journeys that all inevitably see security requirements driving them towards token-based auth/auth.

      Related References (for AWS STS)

       

              DanielMesser Daniel Messer
              rh-ee-adejong Aaren de Jong
              Scott Dodson Scott Dodson
              Brett Tofel, Daniel Messer, Eric Fried, James Harrington, Jeremiah Stuever, Ju Lim, Mark Old, Mike Worthington, Scott Dodson, Shreyans Mulkutkar, Victor Kareh
              Tushar Katarki Tushar Katarki
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: