Details
-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
Description
Feature Overview
- Give cluster administrators a central, cluster-level API to manage extensions commonly referred to as operators
- While the underlying implementation of the functional requirements can be carried out by different controllers with different APIs, to the administrative user there should be a single, cluster-level API that represents an installed extension with all high-level controls
Goals
- Provide a central, easy-to-understand API for installing, configuring, updating and removing extensions
- Enable GitOps-driven workflow with a single, declarative control surface to install, update, configure and manage permissions of an extension
- Ease troubleshooting and debugging by coalescing all relevant output and status from behind-the-scenes components and processes in a top-level aggregate object
- Simplified status reporting of extension availability and readiness
- A simple point of reference/input for other services / APIs in the system
- Compared to OLM 0.x a cluster-level API reduces resource usage in the system and alleviates a lot of the problems associated with multiple, conflicting installs of the same extension at the namespace scope
- Unblock constraint/dependency resolution by resolving at the cluster-level instead of the namespace-level
Functional Requirements
- see the functional requirements for more information about certain requirements referenced by F<number>: https://github.com/operator-framework/operator-controller/blob/main/docs/olmv1_roadmap.md
Requirement | Notes | isMvp? |
---|---|---|
Surfaces provided APIs and features | CRDs | Done |
Must interface / reference with existing catalogs | Done | |
Allows to declaratively install an extension | Done | |
Trigger extension updates when setting a desired target version | Done | |
Allows to trigger extension removal | Done | |
The API should be named `operator` | Done | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Input into dependency / constraint resolution preview | F3 | NO |
Lists resolved dependency | F4 | NO |
Input into permission preview | F5 | NO |
Triggers and surfaces install / update preflight checks | OCPBU-108 / F6 | NO |
Allows to configure the auto-update policy | F8 | NO |
Report and emit events about available updates to the extensions | F9 | NO |
Is subject to request / approval flow of user-triggered extension install / updates | F11 | NO |
Non-privileged users can discover whether an extension is available to them in a namespace | F12 | NO |
Allows to specify the list of namespaces in which the extension should be granted its requested permissions | OCPPLAN-9681 / F13 | NO |
Surfaces extension permission escalation during updates | F14 | NO |
Allows to specify subsets of permissions the extension should get in a list of namespaces | F15 | NO |
Allows to trigger cascading extension removal | F17 | NO |
Supports extension installation from a standalone RukPak bundle | F18 | NO |
Triggers and surfaces and results of constraint evaluation | OCPBU-112 / F19 | NO |
Reports aggregated extension health | F20 | NO |
Reports custom extension health | F21 | NO |
Allows to control rollback behavior | F23 | NO |
Support customization through overrides | F24 | NO |
Reports custom upgrade readiness | F25 | NO |
Supports canary style rollouts | F26 | NO |
Behavioral Requirements
- see the behavioral requirements here https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit#
Requirement | Notes | isMvp? |
---|---|---|
Single API as the primary touch points for users during regular usage and troubleshooting | Lower level APIs (RukPak, Deppy) should not need to be consulted unless for diagnoses by support personnel or rare edge cases | YES |
GitOps-friendliness | All major operations need to be able to be triggered with a single PATCH | YES |
Declarative Control | All operations have to be carried out in a goal-seeking, eventual consistent fashion and should not conduct one-off attempts that need to be manually reset / retried | YES |
Force overrides | Provide declarative override switches that disable constraint and dependency resolution and ignore the update path to give SREs an escape hatch | YES |
Scalability and resource consumption | Consider the case where OLM is installed in a cluster with hundreds of extensions installed, thousands of namespaces and users | YES |
Event trail | All major operations and state changes should trigger a Kubernetes event associated with the extension in question to aid with troubleshooting and auditing | NO |
Human readable status | For all in-transit, failure or success states we need to surface user-friendly state information that does not presume knowledge of OLMs internal workings or custom controller | NO |
Use Cases
Main Use Cases:
- Extension installation and removal
- Extension updates or downgrades
- Extension permission management (what can the extension do)
- Extension access management (who can use the extensions
- Extension reporting on health and update readiness
- Update policy management
Alternative Use Cases:
- Constraint resolution status review
- Extension update availability
- Request / Approval Flow for non-cluster-admin initiated installs
- Canary style rollout control
- Customizing placement, environment variables, volume mounts, annotations and labels of extension components
Out of Scope
- Extension catalog discovery (F2) - show users all available extensions in all possible versions in all possible channels (likely a separate API / service leverage by UI / CLI plugin)
- Dependency Preview (F4) - show users upfront what a particular extension depends on (likely a separate API / service leverage by UI / CLI plugin)
- Permission preview (F5) - show users upfront what permissions a particular extension requires (likely a separate API / service by UI / CLI plugin)
Background, and strategic fit
This is part of a larger effort to re-design vital parts of the OLM APIs and conceptual models to fit the use case of OLM in managed service environments, GitOps-controlled infrastructure and restrictive self-managed deployments in Enterprise environments. You can learn more about it here: https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit?usp=sharing
Assumptions
- Operators are extensions supported by OLM that are handled at the cluster-level
- Constraints and dependency resolution during installation and updates happens at the cluster level
- Functionality to aid the user experience for interactive installs like permission and constraint preview is supplied via a separate service / API
Customer Considerations
- Administrative users / privileged users:
- users have unfettered access to this API and thus can install extensions on the cluster escalating through OLMs permissions
- are typically either cluster administrators or privileged users tasked with maintaining a particular service / extension on the cluster
- are often SREs providing managed services enabled by the extension on the cluster
- Tenants / non-privileged users
- have only limited access to the cluster-level operator API, that is explicitly granted by a privileged user
- have insight into what the extension represented by an instance of the cluster-level operator API provide
- do not have the ability to self-sufficiently install extensions but can leverage a request / approval flow similar to that of certificate signing requests where some approvals might be given automatically
Documentation Considerations
- the nature of operators being a cluster-wide extension to the Kubernetes control plane needs to be highlighted, contrasting the current notion of namespace-scoped operators which tenants can self-sufficiently installs without any conflicts
- the lines of visibility and responsibility on extension handling need to be explained
- users need to be made aware of the way OLM resolves constraints and related extensions together, causing extensions to update in lockstep
- administrators need to know about the consequence of removing an operator in the form controlled-based extension that ships CRDs, specifically about cascading removal that involves removing managed workloads
- admins are responsible for ensuring the least privilege principle on the cluster, they need to be exposed to the permission and access management concept of the operator API
Attachments
Issue Links
- blocks
-
CNV-34170 CNV as a Cluster Level Operator
-
- New
-
- split to
-
OCPSTRAT-199 [Phase 1 MVP/Tech Preview] OLM 1.0 - Cluster-level Operator API (B1)
-
- Closed
-