Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-268

[GA release] OLM 1.0 Behavioral requirements - Simplified API control surface

XMLWordPrintable

    • Icon: Outcome Outcome
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Operator Framework
    • None
    • OCPSTRAT-27OLM V1: Operators, Operator Lifecycle Management, and Operator Hub
    • 50% To Do, 0% In Progress, 50% Done

      Feature Overview

      • Give cluster administrators a central, cluster-level API to manage extensions commonly referred to as operators
      • While the underlying implementation of the functional requirements can be carried out by different controllers with different APIs, to the administrative user there should be a single, cluster-level API that represents an installed extension with all high-level controls

      Goals

      • Provide a central, easy-to-understand API for installing, configuring, updating and removing extensions
      • Enable GitOps-driven workflow with a single, declarative control surface to install, update, configure and manage permissions of an extension
      • Ease troubleshooting and debugging by coalescing all relevant output and status from behind-the-scenes components and processes in a top-level aggregate object
      • Simplified status reporting of extension availability and readiness
      • A simple point of reference/input for other services / APIs in the system
      • Compared to OLM 0.x a cluster-level API reduces resource usage in the system and alleviates a lot of the problems associated with multiple, conflicting installs of the same extension at the namespace scope
      • Unblock constraint/dependency resolution by resolving at the cluster-level instead of the namespace-level

      Functional Requirements

      Requirement Notes isMvp?
      Surfaces provided APIs and features CRDs Done
      Must interface / reference with existing catalogs OCPBU-107 / F1 Done
      Allows to declaratively install an extension OCPBU-109 / F7 Done
      Trigger extension updates when setting a desired target version OCPBU-549 / F10 (Phase 1 scope) Done
      Allows to trigger extension removal OCPBU-111 / F16 Done
      The API should be named `operator`   Done
      CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
      Release Technical Enablement Provide necessary release enablement details and documents. YES
      Input into dependency / constraint resolution preview F3 NO
      Lists resolved dependency F4 NO
      Input into permission preview F5 NO
      Triggers and surfaces install / update preflight checks OCPBU-108 / F6 YES
      NO
      Allows to configure the auto-update policy F8 NO
      Report and emit events about available updates to the extensions F9 NO
      Is subject to request / approval flow of user-triggered extension install / updates F11 NO
      Non-privileged users can discover whether an extension is available to them in a namespace F12 YES
      NO
      Allows to specify the list of namespaces in which the extension should be granted its requested permissions OCPPLAN-9681 / F13 YES
      NO
      Surfaces extension permission escalation during updates F14 NO
      Allows to specify subsets of permissions the extension should get in a list of namespaces F15 NO
      Allows to trigger cascading extension removal F17 NO
      Supports extension installation from a standalone RukPak bundle F18 NO
      Triggers and surfaces and results of constraint evaluation OCPBU-112 / F19 YES
      NO
      Reports aggregated extension health F20 YES
      NO
      Reports custom extension health F21 NO
      Allows to control rollback behavior F23 NO
      Support customization through overrides F24 NO
      Reports custom upgrade readiness F25 NO
      Supports canary style rollouts F26 NO

      Behavioral Requirements

      Requirement Notes isMvp?
      Single API as the primary touch points for users during regular usage and troubleshooting Lower level APIs (RukPak, Deppy) should not need to be consulted unless for diagnoses by support personnel or rare edge cases YES
      GitOps-friendliness All major operations need to be able to be triggered with a single PATCH YES
      Declarative Control All operations have to be carried out in a goal-seeking, eventual consistent fashion and should not conduct one-off attempts that need to be manually reset / retried YES
      Force overrides Provide declarative override switches that disable constraint and dependency resolution and ignore the update path to give SREs an escape hatch YES
      Scalability and resource consumption Consider the case where OLM is installed in a cluster with hundreds of extensions installed, thousands of namespaces and users YES
      Event trail All major operations and state changes should trigger a Kubernetes event associated with the extension in question to aid with troubleshooting and auditing NO
      Human readable status For all in-transit, failure or success states we need to surface user-friendly state information that does not presume knowledge of OLMs internal workings or custom controller NO

      Use Cases

      Main Use Cases:

      • Extension installation and removal
      • Extension updates or downgrades
      • Extension permission management (what can the extension do)
      • Extension access management (who can use the extensions
      • Extension reporting on health and update readiness
      • Update policy management

      Alternative Use Cases:

      • Constraint resolution status review
      • Extension update availability
      • Request / Approval Flow for non-cluster-admin initiated installs
      • Canary style rollout control
      • Customizing placement, environment variables, volume mounts, annotations and labels of extension components

      Out of Scope

      • Extension catalog discovery (F2) - show users all available extensions in all possible versions in all possible channels (likely a separate API / service leverage by UI / CLI plugin)
      • Dependency Preview (F4) - show users upfront what a particular extension depends on (likely a separate API / service leverage by UI / CLI plugin)
      • Permission preview (F5) - show users upfront what permissions a particular extension requires (likely a separate API / service by UI / CLI plugin)

      Background, and strategic fit

      This is part of a larger effort to re-design vital parts of the OLM APIs and conceptual models to fit the use case of OLM in managed service environments, GitOps-controlled infrastructure and restrictive self-managed deployments in Enterprise environments. You can learn more about it here: https://docs.google.com/document/d/1LX4dJMbSmuIMn98tCiBaONmTunWR6zdUJuH-8uZ8Cno/edit?usp=sharing

      Assumptions

      • Operators are extensions supported by OLM that are handled at the cluster-level
      • Constraints and dependency resolution during installation and updates happens at the cluster level
      • Functionality to aid the user experience for interactive installs like permission and constraint preview is supplied via a separate service / API

      Customer Considerations

      • Administrative users / privileged users:
        • users have unfettered access to this API and thus can install extensions on the cluster escalating through OLMs permissions
        • are typically either cluster administrators or privileged users tasked with maintaining a particular service / extension on the cluster
        • are often SREs providing managed services enabled by the extension on the cluster
      • Tenants / non-privileged users
        • have only limited access to the cluster-level operator API, that is explicitly granted by a privileged user
        • have insight into what the extension represented by an instance of the cluster-level operator API provide
        • do not have the ability to self-sufficiently install extensions but can leverage a request / approval flow similar to that of certificate signing requests where some approvals might be given automatically

      Documentation Considerations

      • the nature of operators being a cluster-wide extension to the Kubernetes control plane needs to be highlighted, contrasting the current notion of namespace-scoped operators which tenants can self-sufficiently installs without any conflicts
      • the lines of visibility and responsibility on extension handling need to be explained
      • users need to be made aware of the way OLM resolves constraints and related extensions together, causing extensions to update in lockstep
      • administrators need to know about the consequence of removing an operator in the form controlled-based extension that ships CRDs, specifically about cascading removal that involves removing managed workloads
      • admins are responsible for ensuring the least privilege principle on the cluster, they need to be exposed to the permission and access management concept of the operator API

       

            rhn-coreos-tunwu Tony Wu
            DanielMesser Daniel Messer
            Joe Lanford Joe Lanford
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: