-
Feature
-
Resolution: Done
-
Major
-
None
-
BU Product Work
-
False
-
-
False
-
OCPSTRAT-27OLM V1: Operators, Operator Lifecycle Management, and Operator Hub
-
0% To Do, 0% In Progress, 100% Done
-
0
-
Program Call
Feature Overview
- Note: This feature is aimed to track the progress of Phase 1 MVP scope for the OLM 1.0
- i.e., "IsMVP: YES" in the table of Functional Requirements and Behavioral Requirements.
- Give cluster administrators a central, cluster-level API to manage extensions commonly referred to as operators
- While the underlying implementation of the functional requirements can be carried out by different controllers with different APIs, to the administrative user there should be a single, cluster-level API that represents an installed extension with all high-level controls
Scenarios
Specifically for MVP Phase 1/Tech Preview in OCP 4.14, an OLM user can walk through the below scenario with OLM 1.0 top-level/user-facing API via CLI or GitOps declaratively:
- package discoverability:
- see a list of existing OpenShfit default catalogs on the cluster.
- see packages/Operators that support the `AllNamespaces` mode (this relies on `packagemanifest` API for now)
- see the available `update channels` of a specific package/Operator in the selected catalog
- see the available `versions` within an `update channel` of a specific package/Operator in the selected catalog
- package installation:
- install a package specified with a non-latest `version` and `update channel` for a specific package/Operator that supports `AllNamespaces` mode in the selected catalog
- package update:
- see all available `versions` within an `update channel` of the installed package/Operator in the selected catalog
- update the package/Operator to the specified (arbitrary) version available in the channel
- package removal:
- uninstall the Operator.
Goals
- Provide a central, easy-to-understand API for installing, configuring, updating and removing extensions
- Enable GitOps-driven workflow with a single, declarative control surface to install, update, configure and manage permissions of an extension
- Ease troubleshooting and debugging by coalescing all relevant output and status from behind-the-scenes components and processes in a top-level aggregate object
- Simplified status reporting of extension availability and readiness
- A simple point of reference / input for other services / APIs in the system
- Compared to OLM 0.x a cluster-level API reduces resource usage in the system and alleviates a lot of the problems associated with multiple, conflicting installs of the same extension at the namespace scope
- Unblock constraint / dependency resolution by resolving at the cluster-level instead of the namespace-level
Functional Requirements
- see the functional requirements for more information about certain requirements referenced by F<number>: https://github.com/operator-framework/operator-controller/blob/main/docs/olmv1_roadmap.md
Requirement | Notes | isMvp? |
---|---|---|
Surfaces provided APIs and features | CRDs | YES |
Must interface / reference with existing catalogs | YES | |
Allows to declaratively install an extension | YES | |
Trigger extension updates when setting a desired target version | YES | |
Allows to trigger extension removal | YES | |
The API should be named `operator` | YES | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Input into dependency / constraint resolution preview | F3 | NO |
Lists resolved dependency | F4 | NO |
Input into permission preview | F5 | NO |
Triggers and surfaces install / update preflight checks | OCPBU-108 / F6 | NO |
Allows to configure the auto-update policy | F8 | NO |
Report and emit events about available updates to the extensions | F9 | NO |
Is subject to request / approval flow of user-triggered extension install / updates | F11 | NO |
Non-privileged users can discover whether an extension is available to them in a namespace | F12 | NO |
Allows to specify the list of namespaces in which the extension should be granted its requested permissions | OCPPLAN-9681 / F13 | NO |
Surfaces extension permission escalation during updates | F14 | NO |
Allows to specify subsets of permissions the extension should get in a list of namespaces | F15 | NO |
Allows to trigger cascading extension removal | F17 | NO |
Supports extension installation from a standalone RukPak bundle | F18 | NO |
Triggers and surfaces and results of constraint evaluation | OCPBU-112 / F19 | NO |
Reports aggregated extension health | F20 | NO |
Reports custom extension health | F21 | NO |
Allows to control rollback behavior | F23 | NO |
Support customization through overrides | F24 | NO |
Reports custom upgrade readiness | F25 | NO |
Supports canary style rollouts | F26 | NO |
Behavioral Requirements
- see the behavioral requirements here https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit#
Requirement | Notes | isMvp? |
---|---|---|
Single API as the primary touch points for users during regular usage and troubleshooting | Lower level APIs (RukPak, Deppy) should not need to be consulted unless for diagnoses by support personnel or rare edge cases | YES |
GitOps-friendliness | All major operations need to be able to be triggered with a single PATCH | YES |
Declarative Control | All operations have to be carried out in a goal-seeking, eventual consistent fashion and should not conduct one-off attempts that need to be manually reset / retried | YES |
Force overrides | Provide declarative override switches that disable constraint and dependency resolution and ignore the update path to give SREs an escape hatch | YES |
Scalability and resource consumption | Consider the case where OLM is installed in a cluster with hundreds of extensions installed, thousands of namespaces and users | YES |
Event trail | All major operations and state changes should trigger a Kubernetes event associated with the extension in question to aid with troubleshooting and auditing | NO |
Human readable status | For all in-transit, failure or success states we need to surface user-friendly state information that does not presume knowledge of OLMs internal workings or custom controller | NO |
Use Cases
Main Use Cases:
- Extension installation and removal
- Extension updates or downgrades
- Extension permission management (what can the extension do)
- Extension access management (who can use the extensions
- Extension reporting on health and update readiness
- Update policy management
Alternative Use Cases:
- Constraint resolution status review
- Extension update availability
- Request / Approval Flow for non-cluster-admin initiated installs
- Canary style rollout control
- Customizing placement, environment variables, volume mounts, annotations and labels of extension components
Out of Scope
- Extension catalog discovery (F2) - show users all available extensions in all possible versions in all possible channels (likely a separate API / service leverage by UI / CLI plugin)
- Dependency Preview (F4) - show users upfront what a particular extension depends on (likely a separate API / service leverage by UI / CLI plugin)
- Permission preview (F5) - show users upfront what permissions a particular extension requires (likely a separate API / service by UI / CLI plugin)
Background, and strategic fit
This is part of a larger effort to re-design vital parts of the OLM APIs and conceptual models to fit the use case of OLM in managed service environments, GitOps-controlled infrastructure and restrictive self-managed deployments in Enterprise environments. You can learn more about it here: https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit?usp=sharing
Assumptions
- Operators are extensions supported by OLM that are handled at the cluster-level
- Constraints and dependency resolution during installation and updates happens at the cluster level
- Functionality to aid the user experience for interactive installs like permission and constraint preview is supplied via a separate service / API
Customer Considerations
- Administrative users / privileged users:
- users have unfettered access to this API and thus can install extensions on the cluster escalating through OLMs permissions
- are typically either cluster administrators or privileged users tasked with maintaining a particular service / extension on the cluster
- are often SREs providing managed services enabled by the extension on the cluster
- Tenants / non-privileged users
- have only limited access to the cluster-level operator API, that is explicitly granted by a privileged user
- have insight into what the extension represented by an instance of the cluster-level operator API provide
- do not have the ability to self-sufficiently install extensions but can leverage a request / approval flow similar to that of certificate signing requests where some approvals might be given automatically
Documentation Considerations
- The nature of operators being cluster-wide extension to the Kubernetes control plane needs to be highlighted, contrasting the current notion of namespace-scoped operators which tenants can self-sufficiently installs without any conflicts
The lines of visibility and responsibility on extension handling need to be explained- In 4.14 TP, there are no controls for the visibility of an operator. It'll be on the cluster admins to manage the lifecycle of operators, as the scenarios mentioned in the doc section in:
OCPSTRAT-337[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Catalogs (F1)OCPSTRAT-443[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Installation (F7)OCPSTRAT-450[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension updates (F10)OCPSTRAT-411[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Removal (F16)
- In 4.14 TP, there are no controls for the visibility of an operator. It'll be on the cluster admins to manage the lifecycle of operators, as the scenarios mentioned in the doc section in:
Users need to be made aware of the way OLM resolves constraints and related extension together, causing extensions to update in lock step- In 4.14 TP, OLM does not fully support operator update in considering the dependency on other operators yet. The supported installation and update are as the scenarios mentioned in the doc section:
OCPSTRAT-443[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Installation (F7)OCPSTRAT-450[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension updates (F10)
- In 4.14 TP, OLM does not fully support operator update in considering the dependency on other operators yet. The supported installation and update are as the scenarios mentioned in the doc section:
- Administrators need to know about the consequence of removing an operator in the form controlled-based extension that ships CRDs, specifically about cascading removal that involves removing managed workloads
- In 4.14 TP, removing an operator will clear up the operator's resources including the CRDs it provides, as the scenarios mentioned in the doc section in:
OCPSTRAT-411[Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Removal (F16)
- In 4.14 TP, removing an operator will clear up the operator's resources including the CRDs it provides, as the scenarios mentioned in the doc section in:
Admins are responsible for ensuring the least privilege principle on the cluster, they need to be exposed to the permission and access management concept of the operator API- In 4.14 TP, the controls to permission and access management concept of the operator API are not yet supported.
- depends on
-
OPRUN-2770 [upstream] Deppy/Operator API Integration
- Dev Complete
- is blocked by
-
OCPSTRAT-337 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Catalogs (F1)
- Closed
-
OCPSTRAT-411 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Removal (F16)
- Closed
-
OCPSTRAT-443 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Installation (F7)
- Closed
-
OCPSTRAT-450 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension updates (F10)
- Closed
- relates to
-
RFE-1329 Capability to install a specific version of operator from the Operator hub and to downgrade the operator
- Deferred
- split from
-
OCPSTRAT-268 [GA release] OLM 1.0 Behavioral requirements - Simplified API control surface
- New
- links to