Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-199

[Phase 1 MVP/Tech Preview] OLM 1.0 - Cluster-level Operator API (B1)

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • False
    • OCPSTRAT-27Operators and Operator Lifecycle Management and Operator Hub
    • 93
    • 93% 93%
    • 0
    • 0
    • Program Call

    Description

      Feature Overview

      • Note: This feature is aimed to track the progress of Phase 1 MVP scope for the OLM 1.0
        • i.e., "IsMVP: YES" in the table of Functional Requirements and Behavioral Requirements.
      • Give cluster administrators a central, cluster-level API to manage extensions commonly referred to as operators
      • While the underlying implementation of the functional requirements can be carried out by different controllers with different APIs, to the administrative user there should be a single, cluster-level API that represents an installed extension with all high-level controls

      Scenarios

      Specifically for MVP Phase 1/Tech Preview in OCP 4.14, an OLM user can walk through the below scenario with OLM 1.0 top-level/user-facing API via CLI or GitOps declaratively:

      • package discoverability:
        • see a list of existing OpenShfit default catalogs on the cluster.
        • see packages/Operators that support the `AllNamespaces` mode (this relies on `packagemanifest` API for now)
        • see the available `update channels` of a specific package/Operator in the selected catalog
        • see the available `versions` within an `update channel` of a specific package/Operator in the selected catalog
      • package installation:
        • install a package specified with a non-latest `version` and `update channel` for a specific package/Operator that supports `AllNamespaces` mode in the selected catalog
      • package update:
        • see all available `versions` within an `update channel` of the installed package/Operator in the selected catalog
        • update the package/Operator to the specified (arbitrary) version available in the channel
      • package removal:
        • uninstall the Operator.

      Goals

      • Provide a central, easy-to-understand API for installing, configuring, updating and removing extensions
      • Enable GitOps-driven workflow with a single, declarative control surface to install, update, configure and manage permissions of an extension
      • Ease troubleshooting and debugging by coalescing all relevant output and status from behind-the-scenes components and processes in a top-level aggregate object
      • Simplified status reporting of extension availability and readiness
      • A simple point of reference / input for other services / APIs in the system
      • Compared to OLM 0.x a cluster-level API reduces resource usage in the system and alleviates a lot of the problems associated with multiple, conflicting installs of the same extension at the namespace scope
      • Unblock constraint / dependency resolution by resolving at the cluster-level instead of the namespace-level

      Functional Requirements

      Requirement Notes isMvp?
      Surfaces provided APIs and features CRDs YES
      Must interface / reference with existing catalogs OCPBU-107 / F1 YES
      Allows to declaratively install an extension OCPBU-109 / F7 YES
      Trigger extension updates when setting a desired target version OCPBU-549 / F10 (Phase 1 scope) YES
      Allows to trigger extension removal OCPBU-111 / F16 YES
      The API should be named `operator`   YES
      CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
      Release Technical Enablement Provide necessary release enablement details and documents. YES
      Input into dependency / constraint resolution preview F3 NO
      Lists resolved dependency F4 NO
      Input into permission preview F5 NO
      Triggers and surfaces install / update preflight checks OCPBU-108 / F6 YES
      NO
      Allows to configure the auto-update policy F8 NO
      Report and emit events about available updates to the extensions F9 NO
      Is subject to request / approval flow of user-triggered extension install / updates F11 NO
      Non-privileged users can discover whether an extension is available to them in a namespace F12 YES
      NO
      Allows to specify the list of namespaces in which the extension should be granted its requested permissions OCPPLAN-9681 / F13 YES
      NO
      Surfaces extension permission escalation during updates F14 NO
      Allows to specify subsets of permissions the extension should get in a list of namespaces F15 NO
      Allows to trigger cascading extension removal F17 NO
      Supports extension installation from a standalone RukPak bundle F18 NO
      Triggers and surfaces and results of constraint evaluation OCPBU-112 / F19 YES
      NO
      Reports aggregated extension health F20 YES
      NO
      Reports custom extension health F21 NO
      Allows to control rollback behavior F23 NO
      Support customization through overrides F24 NO
      Reports custom upgrade readiness F25 NO
      Supports canary style rollouts F26 NO

      Behavioral Requirements

      Requirement Notes isMvp?
      Single API as the primary touch points for users during regular usage and troubleshooting Lower level APIs (RukPak, Deppy) should not need to be consulted unless for diagnoses by support personnel or rare edge cases YES
      GitOps-friendliness All major operations need to be able to be triggered with a single PATCH YES
      Declarative Control All operations have to be carried out in a goal-seeking, eventual consistent fashion and should not conduct one-off attempts that need to be manually reset / retried YES
      Force overrides Provide declarative override switches that disable constraint and dependency resolution and ignore the update path to give SREs an escape hatch YES
      Scalability and resource consumption Consider the case where OLM is installed in a cluster with hundreds of extensions installed, thousands of namespaces and users YES
      Event trail All major operations and state changes should trigger a Kubernetes event associated with the extension in question to aid with troubleshooting and auditing NO
      Human readable status For all in-transit, failure or success states we need to surface user-friendly state information that does not presume knowledge of OLMs internal workings or custom controller NO

      Use Cases

      Main Use Cases:

      • Extension installation and removal
      • Extension updates or downgrades
      • Extension permission management (what can the extension do)
      • Extension access management (who can use the extensions
      • Extension reporting on health and update readiness
      • Update policy management

      Alternative Use Cases:

      • Constraint resolution status review
      • Extension update availability
      • Request / Approval Flow for non-cluster-admin initiated installs
      • Canary style rollout control
      • Customizing placement, environment variables, volume mounts, annotations and labels of extension components

      Out of Scope

      • Extension catalog discovery (F2) - show users all available extensions in all possible versions in all possible channels (likely a separate API / service leverage by UI / CLI plugin)
      • Dependency Preview (F4) - show users upfront what a particular extension depends on (likely a separate API / service leverage by UI / CLI plugin)
      • Permission preview (F5) - show users upfront what permissions a particular extension requires (likely a separate API / service by UI / CLI plugin)

      Background, and strategic fit

      This is part of a larger effort to re-design vital parts of the OLM APIs and conceptual models to fit the use case of OLM in managed service environments, GitOps-controlled infrastructure and restrictive self-managed deployments in Enterprise environments. You can learn more about it here: https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit?usp=sharing

      Assumptions

      • Operators are extensions supported by OLM that are handled at the cluster-level
      • Constraints and dependency resolution during installation and updates happens at the cluster level
      • Functionality to aid the user experience for interactive installs like permission and constraint preview is supplied via a separate service / API

      Customer Considerations

      • Administrative users / privileged users:
        • users have unfettered access to this API and thus can install extensions on the cluster escalating through OLMs permissions
        • are typically either cluster administrators or privileged users tasked with maintaining a particular service / extension on the cluster
        • are often SREs providing managed services enabled by the extension on the cluster
      • Tenants / non-privileged users
        • have only limited access to the cluster-level operator API, that is explicitly granted by a privileged user
        • have insight into what the extension represented by an instance of the cluster-level operator API provide
        • do not have the ability to self-sufficiently install extensions but can leverage a request / approval flow similar to that of certificate signing requests where some approvals might be given automatically

      Documentation Considerations

      • The nature of operators being cluster-wide extension to the Kubernetes control plane needs to be highlighted, contrasting the current notion of namespace-scoped operators which tenants can self-sufficiently installs without any conflicts
      • The lines of visibility and responsibility on extension handling need to be explained
        • In 4.14 TP, there are no controls for the visibility of an operator.  It'll be on the cluster admins to manage the lifecycle of operators, as the scenarios mentioned in the doc section in:
          • OCPSTRAT-337 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Catalogs (F1)
          • OCPSTRAT-443 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Installation (F7)
          • OCPSTRAT-450 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension updates (F10)
          • OCPSTRAT-411 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Removal (F16)
      • Users need to be made aware of the way OLM resolves constraints and related extension together, causing extensions to update in lock step
        • In 4.14 TP, OLM does not fully support operator update in considering the dependency on other operators yet. The supported installation and update are as the scenarios mentioned in the doc section:
          • OCPSTRAT-443 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Installation (F7)
          • OCPSTRAT-450 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension updates (F10)
      • Administrators need to know about the consequence of removing an operator in the form controlled-based extension that ships CRDs, specifically about cascading removal that involves removing managed workloads
        • In 4.14 TP, removing an operator will clear up the operator's resources including the CRDs it provides, as the scenarios mentioned in the doc section in:
          • OCPSTRAT-411 [Phase 1 MVP/Tech Preview] OLM 1.0 - Extension Removal (F16)
      • Admins are responsible for ensuring the least privilege principle on the cluster, they need to be exposed to the permission and access management concept of the operator API
        • In 4.14 TP, the controls to permission and access management concept of the operator API are not yet supported.

      Attachments

        Issue Links

          Activity

            People

              DanielMesser Daniel Messer
              DanielMesser Daniel Messer
              Jian Zhang Jian Zhang
              Matthew Werner Matthew Werner
              Joe Lanford Joe Lanford
              Senthamilarasu S Senthamilarasu S
              Votes:
              0 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: