-
Outcome
-
Resolution: Unresolved
-
Critical
-
None
-
openshift-4.13, openshift-4.14
-
0% To Do, 67% In Progress, 33% Done
-
False
-
Goal: Managed OpenShift consumers on ARO and self-managed OpenShift customers on Azure can rely on layered product operators in the cluster to leverage short lived authentication token services (Azure Managed Identity) to authenticate with the cloud provider's API. The configuration experience to get an operator to do this is streamlined and standardized across all operators.
Background: When components on OpenShift communicate with Azure APIs, they need to authenticate first. This can be done in two ways: a static and long-lived set of credentials generated specifically for the workload ahead of time, or the workload getting permissions to assume a role / policy generated ahead of time, requesting temporary authentication tokens at runtime that are refreshed regularly. In both cases the credentials are associated with the required permissions the workload needs on the Azure API. OpenShift core platform has adopted support on Azure with OCPBU-8. OLM-managed operators have so far not seen a structured approach to enablement and configuration.
Benefit for customers: Short-lived token authentication is considered more secure because in the event of the token accidentally leaking, the window in which it can be used for exploits is small. In this case, no reconfiguration of the workload is required to get a new token. Whereas with static, long-lived tokens, exploitation can happen over a potentially long period of time if the leakage goes undetected. Changing long-lived tokens when the leakage / exploitation is detected also causes administrative overhead. An OpenShift deployment where both the core platform and optional OLM-managed operators exclusively use short-lived tokens for Azure API authentication is considered part of a strong security posture.
Why is this important now: Customers increasingly prefer this short-lived token authentication for API access in their Azure accounts and start to enforce it via policies. A workload / product that doesn't support this typically requires an exception given by customers InfoSec team. Currently the support for this authentication method is fragmented across the layered product portfolio and the catalog of optional, OLM-managed operators. Only a subset support some form of short-lived token usage, sometimes only for another cloud provider than Azure. The configuration experience to enable this varies between operators and is typically manual. If left unchanged, it could be seen as an adoption barrier. Additionally, Hypershift-based deployments are going to use short-lived token authentication exclusively and will have no first-class support for operators, that do not support it.
Outcomes:
- As a customer of OpenShift layered products, I need to be able to fluidly, reliably and consistently install and use OpenShift layered product Kubernetes Operators into my cluster clusters, while leveraging short-lived token authentication throughout my deployment.
- As a customer of OpenShift on the Azure. overall I expect OpenShift as a platform to function equally well with tokenized authentication as it does with static, long-lived credentials. I expect the same from the Kubernetes Operators under the Red Hat brand (that need to reach cloud APIs) in that tokenized workflows are equally integrated and workable.
- We are driving use of layered products by making their adoption on an Azure WIF enabled cluster as simple as using the cluster's core operators
- On Hypershift, where the only credential mode for clusters/customers is short-lived token authentication, the Red Hat branded Operators that must reach the Azure APIs, should be enabled to work with short-lived credentials in a consistent, and automated fashion that allows customer to use those operators as easily as possible, driving the use of layered products.
Current Situation:
- OLM-managed operators today are unable to request cloud credentials via OpenShifts Credential API when installed on a cluster with short-lived authentication enabled (see
OCPBU-8for Azure related work). The CloudCredentialOperator component that would be used for this has currently no support for OLM-managed operators. - On ARO, enabling one of the OLM-managed operators when it is deployed in a managed cluster, customers are required to register the operator with OCM (OpenShift Cluster Manager) before installing it via OperatorHub or directly on the cluster.
- Users are unaware of which operators request credentials
- Users are not warned that operator installation will fail
- Users subsequently are unaware why the installation failed
- Operators timeout waiting for credentials
- Users should be informed of steps required for a successful installation
- CloudCredentialOperator (CCO) doesn’t exist in HyperShift as of today
- Any configuration for short-lived token support for OLM-managed operator installation is currently command-line only
Execution Plan:
Some of the below workstreams will be running in parallel. Proper product documentation and QE is part of all of them.
Workstream 2 - CloudCredentialOperator-based flow for OLM-managed operators and Azure Identity (OCPBU-560)
- CCO gets a new mode in which it can reconcile Azure Workload Identity credential request for OLM-managed operators
- A standardized flow is leveraged to guide users in preparing their Azure IAM policies and roles with permissions that are required for OLM-managed operators
- A standardized flow is defined in which users can configure OLM-managed operators to leverage Azure Identity
- An example operator is used to demonstrate the end2end functionality
- This will be not be backported
Workstream 4 - Azure Identity enablement for critical OLM-managed operators (OCPBU-564)
- based on Workstream 2, the following operators will be enabled to support the standard configuration flow for Azure Identity:
- AFS Operator
- OADP
- Cluster Logging
- these operators only support this flow on OCP 4.14 or newer
Workstream 7 - Continued Azure Identity enablement for OLM-managed operators (OCPBU-569)
- Short-lived token authentication using Azure Identiy for: 3Scale, RHODS, RHODA, ODF, ACM, Ansible Automation Platform
- these operators only support this flow on OCP 4.14 or newer
Definition of done:
- Main success scenario - high-level user story
- customer creates a ARO or ARO HCP cluster
- customer wants basic (table-stakes) features such as CSI Drivers, OADP or Logging
- customer discovers the cluster is in Azure Identity mode and the desired operators are Azure WIF-capable
- customer sees necessary tasks for preparing for the operator in OperatorHub from their cluster
- customer prepares IAM roles/policies in anticipation of the Operator they want, using what they get from OperatorHub
- customer's provides a very minimal set of parameters (Azure role(s) with policy) to the Operator's OperatorHub page
- The cluster can automatically setup the Operator, using the provided tokenized credentials and the Operator functions as expected
- Cluster and Operator upgrades are taken into account and automated
- The above steps 1-7 should apply similarly for Google Cloud and Microsoft Azure Cloud, with their respective token-based workload identity systems.
- Managed OpenShift scenarios - high-level user story
- The same as above, but the ROSA CLI would assist with AWS role/policy creation
- The same as above, but the oc CLI would assist with cloud role/policy management (per respective cloud provider for the cluster)
Desired effect:
- Growth is the acquisition of net new usage of the platform. This can be new workloads not previously able to be supported, new markets not previously considered, or new end users not previously served.
- Retention is maintaining and expanding existing use of the platform. This can be more effective use of tools, competitive pressures, and ease of use improvements.
- Both of growth and retention are the effect of this effort.
- Customers have strict requirements around using only token-based cloud credential systems for workloads in their cloud accounts, which include OpenShift clusters in all forms.
- We gain new customers from both those that have waited for token-based auth/auth from OpenShift and from those that are new to OpenShift, with strict requirements around cloud account access
- We retain customers that are going thru both cloud-native and hybrid-cloud journeys that all inevitably see security requirements driving them towards token-based auth/auth.
- Customers have strict requirements around using only token-based cloud credential systems for workloads in their cloud accounts, which include OpenShift clusters in all forms.
References
- DR-66: Guided operator installs
- Design Document: STS enablement for operators on Managed OpenShift
- Operators & STS
- blocks
-
PROJQUAY-2390 STS protocol for S3 access
- Closed
- clones
-
OCPSTRAT-6 Tokenized Auth Enablement for OLM-managed Operators on AWS
- In Progress
- is depended on by
-
ACM-6424 Support the standardized STS configuration flow via OLM and CCO for ACM
- Backlog
- is related to
-
RFE-6687 Enable WIF support for ACS in GCP
- Rejected
-
OCPSTRAT-605 Ensure compatibility of layered operators for HCP (HyperShift)
- In Progress
- relates to
-
ACM-1775 Ability for RHACM to consume GCP WIF token
- Closed
-
OCPSTRAT-513 Azure managed identity with Azure AD workload identity for self-managed OpenShift
- Closed
-
OCPSTRAT-469 Install and upgrade OpenShift with GCP Workload Identity
- Closed
- links to