• False
    • False
    • ?
    • No
    • ?
    • ?
    • ?
    • 17% To Do, 0% In Progress, 83% Done
    • Undefined
    • XL

      Feature Goal

      • Provide a way for OLM Operators to be installed out of the box in OpenShift / by default as part of the cluster deployment using the OLM Subscription API
      • Provide a way for those operators to lifecycle with the cluster (upgrade when the cluster is upgraded, participate in blocking cluster install/upgrade completion until they are installed/upgraded)

       

      See the platform operator portions of

      https://docs.google.com/document/d/1srswUYYHIbKT5PAC5ZuVos9T2rBnf7k0F1WV2zKUTrA/edit#heading=h.mduog8qznwz

      https://docs.google.com/presentation/d/1U2zYAyrNGBooGBuyQME8Xn905RvOPbVv3XFw3stddZw

       

      And this roadmap for platform operators:

      https://hackmd.io/MFwDuS_XS7quls3Q4ANKLw

       

      Why is this important?

      • Several Operator teams are looking to be installed on OpenShift by default for a regular cluster deployment, e.g. cert-manager, ServiceBindingOperator, LocalStorageOperator
      • This is an OpenShift Downstream ask, because the only other option to ship an operator with every cluster install would be to use CVO which does not allow opting out of cluster defaults without putting the cluster into an unsupported state

      Use Cases

      1. A way exists to configure which Operators should be installed by default that are not in the core payload (day0 operator) 
      2. The list of available day0 operators is under Red Hat's control
      3. Day0 operators should also be available via the regular OperatorHub catalogs
      4. Users are allowed to uninstall such default Operators post-install
      5. Users are allow to configure whether such Operators should be installed by default on Day 0 or not (opt-out)
      6. A default Operator can temporarily prevent a cluster upgrade in case if its in a critical process
      7. An OpenShift install is not considered complete until all of the day0 operators are installed
      8. An OpenShift install is considered failed if any of the day0 operators failed to install
      9. A default Operator can change versions as part of a cluster upgrade 
      10. A default Operator can be uninstalled as part of a cluster upgrade
      11. When a default Operator is removed  such that only the controller is removed, but not the CRDs nor CRs
      12. It should be possible to control for Red Hat which version of the day0 operator gets installed / updated
      13. It should be possible to control for Red Hat from which channel a day0 gets installed / updated
      14. A default Operator needs to be mirrored as part of mirroring OpenShift Release Payloads to enable a cluster in disconnected environments
      15. A default operator is directed to be installed but the required catalogs aren't available yet
      16. A disconnected customer wants to rely on Operators installed by default 
      17. A day0 operator failed to upgrade/install during a cluster upgrade/install and a cluster admins wants to troubleshoot why
      18. A day0 operator is the process of being installed or upgrades and a cluster admins wants to observe progress

      Day0 Operator Criteria

      • must ship in all arches that OCP supports
      •  

      Acceptance Criteria

      • The above user scenarios need to be achievable
      • OpenShift clusters for self-managed installs ship ServiceBindingOperator by default via this mechanism
      • It should be independent of the current OperatorHub API (which we are likely going to deprecate in the future)
      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • Downstream documentation

      Dependencies (internal and external)

      1. OpenShift installer needs to bundle desired day-0 operators in it's payload / default scaffolding of post-install manifests.

      Potential Candidates for Day0 operators:

      1. cert-manager
      2. Servince Binding Operator
      3. Local Storage Operator

      Open questions:

      1. Do we need a special catalog for day-0 operators? Should this be part of the payload vs. an optional component?
      2. What happens during cluster updates?
      3. Does this block cluster installs?
      4. How are these operator's discovered for offline mirroring?
      5. Who is ultimately in control of day0 operator updates? The customer or Red Hat?
      6. Are day0 operators always auto-updating? Do we allow users to change that?
      7. Do we need to consider how to create an aggregated status that includes both CVO and OLM operators? (i.e. something that aggregates both CVO's ClusterOperator and OLM's OperatorCondition)

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            [OCPPLAN-9555] Platform Operators

            Ben Parees added a comment -

            The ability to define such "profiles" for installations is definitely one of the intended benefits of the Composable Openshift story in general. (It would apply equally to "a set of platform operators to enable/disable" as it would to "a set of optional capabilities in the payload that should be enabled/disabled").

            In fact the CVO capability feature already has a concept of capability sets, though today the only sets are basically "All on" and "all off", but other sets could be introduced. Whether(and how) the platform operator set should be coupled to the feature cap sets is something we'd want to think about.

            That said, another goal here is to make it possible to define such profiles externally to the payload, so that a third party (another redhat product team, or an external vendor) can define their own profile by picking the caps+operators they want. So the starting point for an implementation in my mind would be a wrapper around the install-config that enabled/disabled the desired things, instead of hardcoding profile definitions into the installer (though we could have some predefined).

            Ben Parees added a comment - The ability to define such "profiles" for installations is definitely one of the intended benefits of the Composable Openshift story in general. (It would apply equally to "a set of platform operators to enable/disable" as it would to "a set of optional capabilities in the payload that should be enabled/disabled"). In fact the CVO capability feature already has a concept of capability sets, though today the only sets are basically "All on" and "all off", but other sets could be introduced. Whether(and how) the platform operator set should be coupled to the feature cap sets is something we'd want to think about. That said, another goal here is to make it possible to define such profiles externally to the payload, so that a third party (another redhat product team, or an external vendor) can define their own profile by picking the caps+operators they want. So the starting point for an implementation in my mind would be a wrapper around the install-config that enabled/disabled the desired things, instead of hardcoding profile definitions into the installer (though we could have some predefined).

            Daniel Del Ciancio added a comment - - edited

            Have we considered the use of composable profile types: for example, a "disconnected" install which would exclude a pre-defined list of operators, such as insights, to name one.

            Daniel Del Ciancio added a comment - - edited Have we considered the use of composable profile types: for example, a "disconnected" install which would exclude a pre-defined list of operators, such as insights, to name one.

            This goal makes total sense to me, I'd note there's an overlap in goal here between I think what users may want for Device Edge - in some cases they may also want to generate a single image (versioned thing) that they transactionally upgrade to - their workloads included (or at least, some of them).

            Colin Walters added a comment - This goal makes total sense to me, I'd note there's an overlap in goal here between I think what users may want for Device Edge - in some cases they may also want to generate a single image (versioned thing) that they transactionally upgrade to - their workloads included (or at least, some of them).

            Ben Parees added a comment -

            this should have https://issues.redhat.com/browse/OCPPLAN-9638 as a parent but Jira isn't cooperating.

            Ben Parees added a comment - this should have https://issues.redhat.com/browse/OCPPLAN-9638 as a parent but Jira isn't cooperating.

            Ben Parees added a comment -

            Nevertheless I'm not sure at all that running a controller to just watch a singleton instance of the service to be installed makes sense as the work to by example installing the certificate manager is only needed one time. If now, we have an "installation" controller able to check what it is needed to be installed as day-0 (certificate manager, service binding, localstorage, ...), then that makes sense as only one controller should run on the cluster.

            i'm starting to get confused about what controller you are taking issue with/trying to avoid running. Cert manager, service binding, and localstorage controllers are certainly not only doing things at installation time. And the platform operator controller itself is watching for new platform operators to install/uninstall/upgrade(as well as report status of) at any time, not just install time.

            That said, and i'll let jlanford@redhat.com keep me honest... the Rukpak model for operator packaging (the next gen of operator packaging) will not actually require that "operators" (olm installable bundles, really) have any long running workload resource at all. So if you want to create an "operator" that just defines some static resources, but doesn't define a long running deployment/controller, you'd be able to do that. I think we'd want to talk through the use case and alternatives before doing it, but i believe it would at least be technically possible.

            Ben Parees added a comment - Nevertheless I'm not sure at all that running a controller to just watch a singleton instance of the service to be installed makes sense as the work to by example installing the certificate manager is only needed one time. If now, we have an "installation" controller able to check what it is needed to be installed as day-0 (certificate manager, service binding, localstorage, ...), then that makes sense as only one controller should run on the cluster. i'm starting to get confused about what controller you are taking issue with/trying to avoid running. Cert manager, service binding, and localstorage controllers are certainly not only doing things at installation time. And the platform operator controller itself is watching for new platform operators to install/uninstall/upgrade(as well as report status of) at any time, not just install time. That said, and i'll let jlanford@redhat.com keep me honest... the Rukpak model for operator packaging (the next gen of operator packaging) will not actually require that "operators" (olm installable bundles, really) have any long running workload resource at all. So if you want to create an "operator" that just defines some static resources, but doesn't define a long running deployment/controller, you'd be able to do that. I think we'd want to talk through the use case and alternatives before doing it, but i believe it would at least be technically possible.

            I fully agree with you bparees@redhat.com concerning the point that we need a framework for installing, upgrading, uninstalling, and reporting on the health of a workload/component.

            Nevertheless I'm not sure at all that running a controller to just watch a singleton instance of the service to be installed makes sense as the work to by example installing the certificate manager is only needed one time. If now, we have an "installation" controller able to check what it is needed to be installed as day-0 (certificate manager, service binding, localstorage, ...), then that makes sense as only one controller should run on the cluster. 

            Charles Moulliard added a comment - I fully agree with you bparees@redhat.com concerning the point that we need a framework for installing, upgrading, uninstalling, and reporting on the health of a workload/component. Nevertheless I'm not sure at all that running a controller to just watch a singleton instance of the service to be installed makes sense as the work to by example installing the certificate manager is only needed one time. If now, we have an "installation" controller able to check what it is needed to be installed as day-0 (certificate manager, service binding, localstorage, ...), then that makes sense as only one controller should run on the cluster. 

            Ben Parees added a comment -

            OLM provides two fundamental things:

            1) a framework for installing, upgrading, uninstalling, and reporting on the health of a workload/component
            2) a pattern for running operators/controllers that can provision instances of operands and maintain/manage those operands

            I think (1) is clearly necessary here.

            With respect to (2) we have many "singleton" operators (not necessarily OLM ones) in OCP, pretty much the entire core product. We also have OLM based ones like ACM.

            Just because you're only running one instance doesn't mean you don't need logic capable of configuring+managing that one instance.

            What would your alternative proposal be for how to include services like cert manager in clusters as an optional component? Keeping in mind that "add it to the payload" is not a viable solution because it couples the lifecycle of cert manager to the OCP lifecycle and makes it possible for cert manager bugs to block the release of OCP.

            Ben Parees added a comment - OLM provides two fundamental things: 1) a framework for installing, upgrading, uninstalling, and reporting on the health of a workload/component 2) a pattern for running operators/controllers that can provision instances of operands and maintain/manage those operands I think (1) is clearly necessary here. With respect to (2) we have many "singleton" operators (not necessarily OLM ones) in OCP, pretty much the entire core product. We also have OLM based ones like ACM. Just because you're only running one instance doesn't mean you don't need logic capable of configuring+managing that one instance. What would your alternative proposal be for how to include services like cert manager in clusters as an optional component? Keeping in mind that "add it to the payload" is not a viable solution because it couples the lifecycle of cert manager to the OCP lifecycle and makes it possible for cert manager bugs to block the release of OCP.

            I like the idea to provision a cluster with some `packages/services` like the competition do (e.g. Tanzu) and ideally the packages to be installed should be associated to a profile in order to deploy what it is relevant for a: Developer, Builder, Security Admin, ...

            I doubt that using OLM would help here as by example it is completely irrelevant to install using an operator something like the Cert Manager as we only need one instance installed/cluster. This is the reason why I dont like so much to use operators when we have to provision a cluster with some services which are cluster-scoped (Certificate Manager, Vault, Service Binding, Tekton, .....).

            Charles Moulliard added a comment - I like the idea to provision a cluster with some `packages/services` like the competition do (e.g. Tanzu) and ideally the packages to be installed should be associated to a profile in order to deploy what it is relevant for a: Developer, Builder, Security Admin, ... I doubt that using OLM would help here as by example it is completely irrelevant to install using an operator something like the Cert Manager as we only need one instance installed/cluster. This is the reason why I dont like so much to use operators when we have to provision a cluster with some services which are cluster-scoped (Certificate Manager, Vault, Service Binding, Tekton, .....).

            Ben Parees added a comment -

            i think this should be tied together with https://issues.redhat.com/browse/OLM-2513 all of which rolls up under https://issues.redhat.com/browse/OCPPLAN-7589

             

            Ben Parees added a comment - i think this should be tied together with https://issues.redhat.com/browse/OLM-2513 all of which rolls up under https://issues.redhat.com/browse/OCPPLAN-7589  

            OpenShift GitOps is also a day0 candidate. The use-case customers raise is that they want to go from zero to a cluster that is up and ready and configured to sync configs from a Git repo to the cluster. In order to achieve that, OpenShift GitOps operator needs to get installed post-cluster-installation, and then the operator needs to get configured to achieve the outcome in this use-case. Note that just installing the operator wouldn't be sufficient in this case.

            Siamak Sadeghianfar added a comment - OpenShift GitOps is also a day0 candidate. The use-case customers raise is that they want to go from zero to a cluster that is up and ready and configured to sync configs from a Git repo to the cluster. In order to achieve that, OpenShift GitOps operator needs to get installed post-cluster-installation, and then the operator needs to get configured to achieve the outcome in this use-case. Note that just installing the operator wouldn't be sufficient in this case.

              Unassigned Unassigned
              DanielMesser Daniel Messer
              Jian Zhang Jian Zhang
              Votes:
              3 Vote for this issue
              Watchers:
              46 Start watching this issue

                Created:
                Updated: