Uploaded image for project: 'Operator Runtime'
  1. Operator Runtime
  2. OPRUN-1579

A single API to manage Operators

XMLWordPrintable

    • OLM users have a single API to manage Operators
    • 8
    • olm
    • To Do
    • OCPPLAN-7733 - Operator API
    • OCPPLAN-7733Operator API
    • 81% To Do, 0% In Progress, 19% Done
    • XL

      Customer Problem: Operator Management UX

      Cluster administrators are tasked to provide stable platform. Many of them use Operators to enhance the platform. They need to be able to install them reproducibly and easily conclude whether an installation is healthy and useable by the intended audience. In case of problems they need to be able to act quickly by finding the root cause in a short amount of time. A lot of cluster administrators are still novice in the Operator space. 

      Goal: Deliver an implementation of the Operator API through which cluster admins can use manage the lifecycle of installed Operators on cluster and cluster tenants can use to understand what Operator services are available. 

      Problem: OLM is currently perceived as very complex. A user needs to be aware of at least 7 APIs (CatalogSource, PackageManifest, Subscription, ClusterServiceVersion, InstallPlan, OperatorGroup, OperatorSource) in order to bring an Operator under OLM management and get it to deploy. This hinders adoption and retaining this setup under the constant increase of requirements on OLM by the rest of the OpenShift architecture will only make matters worse.

      Why is this important: Though it deals with complex matters (make Non-Kubernetes-API-experts able to extend Kubernetes APIs) OLM needs to reduce the visible product surface in order to attract customers and Kubernetes users. Operator developers and users alike need a short route to success in getting their Operator deployed and tested. Not all use cases need the full blown catalog setup which is required as of today.
      This will encourage short feedback loops for Operator developers, eventually increasing the velocity at which Operators are developed and updated, ultimately growing the ecosystem.

      User Feedback:

       

      Dependencies (internal and external):

       

      Prioritized epics + deliverables (in scope / not in scope):

      1. As an Operator Developer, I would like a way to deploy an operator bundle to a cluster using this API type, bypassing the concepts of CatalogSources, Subscriptions, and OperatorGroups.
      2. As an Operator Developer, I would like to include any standard runnable Kubernetes resource (Deployment, DaemonSet, StatefulSet, etc.) as components of my operator's installation.
      3. As a Cluster Admin, I would like this API type to emit events to reflect the lifecycle of the Operator as managed by OLM.
      4. As a Cluster Admin, I would like this API type to drive automatic proxy configuration of operator deployments.
      5. As a Cluster Admin, I would like to use a single API type to reproducibly install specific versions of an operator and its dependencies on any cluster.
      6. As a Cluster Admin, I would like to employ GitOps tooling in conjunction with this API to declaratively install and update Operators with it's continuously retries so that no manual / human intervention is required.
      7. As a Cluster Admin, I would like to rely on this API to force a declaratively install or update of Operators even if there is no update path or unsatisfied dependencies to quickly roll back from a failed update to maintain a managed service SLA.
      8. As a Cluster Admin, I would like to configure the update behavior of an Operator via this API to automatically roll back from a failed update to maintain a managed service SLA.
      9. As a Cluster Admin, I would like to use this API type to control in which namespaces the Operator has the permissions it requested as part of the metadata.
      10. As a Cluster Admin, I would like to use this API type to control in which namespaces the Operator can be discovered and used by tenants.
      11. As a Cluster Admin, I would like this API type to drive the update policy scoped to the referenced Operator bundle, so that I can control whether or not this Operator should be updated automatically, independently of its dependencies.
      12. As a Cluster admin, I would like to use this API type to discover all potential issues reported from lower level components so debugging starts in a common place.
      13. As a Cluster admin, I need this API and all related lower-level APIs have extensive and comprehendible error reporting and messages so no understanding of the interaction process between the various lower-level API or access to the OLM pod logs is required to successfully debug. Kubernetes conventions in regards to status conditions should be adhered to. 
      14. As a Cluster admin, I can rely on this API to safely remove an operator from the cluster so that related resources and configurations, e.g. CRD conversion webhooks, are cleaned up as well.
      15. As a Cluster admin, I can rely on this API to optionally trigger a cascading delete of CRs managed by the operator so that a complete removal including the CRDs can be performed.

      PM Guidance you can choose to ignore:

      The Operator API should not simply be an amalgamation of CSV, Subscription, InstallPlan. It needs to significantly improve the configuration experience by focussing on the context of "Installing and Configuring Operator" rather than the reduced focus of any of the sub-components it ends up using to accomplish the goal. The intent is to hide the complex interdependencies between Subscription<>CSV<>InstallPlan and present "Installing and Configuring an Operator" as a coherent experience.

      The Operator API also needs to serve as a translation layer of the conditions and error states found in the lower-level components when needed in order to put it back into the context of "Installing and Configuring an Operator".

       

      Open Questions:

      • Does the Operator API need to be cluster-scoped?
      • If the Operator API is cluster-scoped, do we need a separate API(Service) to project Operator service availability into namespaces for less privileged namespace admins?

      Risk:

      • scoping discussion not reaching conclusion

      Estimate (XS, S, M, L, XL, XXL): L

      Enhancement Pull Request: https://github.com/openshift/enhancements/pull/28

            njhale Nicholas Hale (Inactive)
            njhale Nicholas Hale (Inactive)
            Jian Zhang Jian Zhang
            Votes:
            2 Vote for this issue
            Watchers:
            35 Start watching this issue

              Created:
              Updated:
              Resolved: