-
Outcome
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
OCPSTRAT-27OLM V1: Operators, Operator Lifecycle Management, and Operator Hub
-
50% To Do, 0% In Progress, 50% Done
-
False
-
Feature Overview
- Give cluster administrators a central, cluster-level API to manage extensions commonly referred to as operators
- While the underlying implementation of the functional requirements can be carried out by different controllers with different APIs, to the administrative user there should be a single, cluster-level API that represents an installed extension with all high-level controls
Goals
- Provide a central, easy-to-understand API for installing, configuring, updating and removing extensions
- Enable GitOps-driven workflow with a single, declarative control surface to install, update, configure and manage permissions of an extension
- Ease troubleshooting and debugging by coalescing all relevant output and status from behind-the-scenes components and processes in a top-level aggregate object
- Simplified status reporting of extension availability and readiness
- A simple point of reference/input for other services / APIs in the system
- Compared to OLM 0.x a cluster-level API reduces resource usage in the system and alleviates a lot of the problems associated with multiple, conflicting installs of the same extension at the namespace scope
- Unblock constraint/dependency resolution by resolving at the cluster-level instead of the namespace-level
Functional Requirements
- see the functional requirements for more information about certain requirements referenced by F<number>: https://github.com/operator-framework/operator-controller/blob/main/docs/olmv1_roadmap.md
Requirement | Notes | isMvp? |
---|---|---|
Surfaces provided APIs and features | CRDs | Done |
Must interface / reference with existing catalogs | Done | |
Allows to declaratively install an extension | Done | |
Trigger extension updates when setting a desired target version | Done | |
Allows to trigger extension removal | Done | |
The API should be named `operator` | Done | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Input into dependency / constraint resolution preview | F3 | NO |
Lists resolved dependency | F4 | NO |
Input into permission preview | F5 | NO |
Triggers and surfaces install / update preflight checks | OCPBU-108 / F6 | NO |
Allows to configure the auto-update policy | F8 | NO |
Report and emit events about available updates to the extensions | F9 | NO |
Is subject to request / approval flow of user-triggered extension install / updates | F11 | NO |
Non-privileged users can discover whether an extension is available to them in a namespace | F12 | NO |
Allows to specify the list of namespaces in which the extension should be granted its requested permissions | OCPPLAN-9681 / F13 | NO |
Surfaces extension permission escalation during updates | F14 | NO |
Allows to specify subsets of permissions the extension should get in a list of namespaces | F15 | NO |
Allows to trigger cascading extension removal | F17 | NO |
Supports extension installation from a standalone RukPak bundle | F18 | NO |
Triggers and surfaces and results of constraint evaluation | OCPBU-112 / F19 | NO |
Reports aggregated extension health | F20 | NO |
Reports custom extension health | F21 | NO |
Allows to control rollback behavior | F23 | NO |
Support customization through overrides | F24 | NO |
Reports custom upgrade readiness | F25 | NO |
Supports canary style rollouts | F26 | NO |
Behavioral Requirements
- see the behavioral requirements here https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit#
Requirement | Notes | isMvp? |
---|---|---|
Single API as the primary touch points for users during regular usage and troubleshooting | Lower level APIs (RukPak, Deppy) should not need to be consulted unless for diagnoses by support personnel or rare edge cases | YES |
GitOps-friendliness | All major operations need to be able to be triggered with a single PATCH | YES |
Declarative Control | All operations have to be carried out in a goal-seeking, eventual consistent fashion and should not conduct one-off attempts that need to be manually reset / retried | YES |
Force overrides | Provide declarative override switches that disable constraint and dependency resolution and ignore the update path to give SREs an escape hatch | YES |
Scalability and resource consumption | Consider the case where OLM is installed in a cluster with hundreds of extensions installed, thousands of namespaces and users | YES |
Event trail | All major operations and state changes should trigger a Kubernetes event associated with the extension in question to aid with troubleshooting and auditing | NO |
Human readable status | For all in-transit, failure or success states we need to surface user-friendly state information that does not presume knowledge of OLMs internal workings or custom controller | NO |
Use Cases
Main Use Cases:
- Extension installation and removal
- Extension updates or downgrades
- Extension permission management (what can the extension do)
- Extension access management (who can use the extensions
- Extension reporting on health and update readiness
- Update policy management
Alternative Use Cases:
- Constraint resolution status review
- Extension update availability
- Request / Approval Flow for non-cluster-admin initiated installs
- Canary style rollout control
- Customizing placement, environment variables, volume mounts, annotations and labels of extension components
Out of Scope
- Extension catalog discovery (F2) - show users all available extensions in all possible versions in all possible channels (likely a separate API / service leverage by UI / CLI plugin)
- Dependency Preview (F4) - show users upfront what a particular extension depends on (likely a separate API / service leverage by UI / CLI plugin)
- Permission preview (F5) - show users upfront what permissions a particular extension requires (likely a separate API / service by UI / CLI plugin)
Background, and strategic fit
This is part of a larger effort to re-design vital parts of the OLM APIs and conceptual models to fit the use case of OLM in managed service environments, GitOps-controlled infrastructure and restrictive self-managed deployments in Enterprise environments. You can learn more about it here: https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit?usp=sharing
Assumptions
- Operators are extensions supported by OLM that are handled at the cluster-level
- Constraints and dependency resolution during installation and updates happens at the cluster level
- Functionality to aid the user experience for interactive installs like permission and constraint preview is supplied via a separate service / API
Customer Considerations
- Administrative users / privileged users:
- users have unfettered access to this API and thus can install extensions on the cluster escalating through OLMs permissions
- are typically either cluster administrators or privileged users tasked with maintaining a particular service / extension on the cluster
- are often SREs providing managed services enabled by the extension on the cluster
- Tenants / non-privileged users
- have only limited access to the cluster-level operator API, that is explicitly granted by a privileged user
- have insight into what the extension represented by an instance of the cluster-level operator API provide
- do not have the ability to self-sufficiently install extensions but can leverage a request / approval flow similar to that of certificate signing requests where some approvals might be given automatically
Documentation Considerations
- the nature of operators being a cluster-wide extension to the Kubernetes control plane needs to be highlighted, contrasting the current notion of namespace-scoped operators which tenants can self-sufficiently installs without any conflicts
- the lines of visibility and responsibility on extension handling need to be explained
- users need to be made aware of the way OLM resolves constraints and related extensions together, causing extensions to update in lockstep
- administrators need to know about the consequence of removing an operator in the form controlled-based extension that ships CRDs, specifically about cascading removal that involves removing managed workloads
- admins are responsible for ensuring the least privilege principle on the cluster, they need to be exposed to the permission and access management concept of the operator API
- blocks
-
CNV-34170 CNV as a Cluster Level Operator
- New
- is incorporated by
-
OCPSTRAT-1347 [GA release] Next-gen OLM (OLM v1)
- In Progress
- is related to
-
OCPSTRAT-1599 Increase OpenShift compatibility with GitOps
- New
- relates to
-
OCPSTRAT-27 OLM V1: Operators, Operator Lifecycle Management, and Operator Hub
- In Progress
- split to
-
OCPSTRAT-199 [Phase 1 MVP/Tech Preview] OLM 1.0 - Cluster-level Operator API (B1)
- Closed