Uploaded image for project: 'OpenShift Hosted Control Plane'
  1. OpenShift Hosted Control Plane
  2. HOSTEDCP-613

HyperShift operator release process validation and documentation

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • 0
    • 0

      As a ROSA-HyperShift service owner, I want each component owner to demonstrate, document and confirm that they can safely release changes at least once per day.

      • Component owners demonstrate, document and confirm that they can safely release changes at least once per day.
      • The role of QE is documented, along with their expected path forward (tests fully automated).
      • Enumerate gaps and have an action plan if you cannot release at least once per day.
      • Review and confirm correctness of your row in this spreadsheet https://docs.google.com/spreadsheets/d/18Zee-5Q2-s66Oh7xGlwLkMvaUuGdlfbkwLQVYFosBH8
      • Provide a link to your validated release procedure documentation (OK to point to existing docs, if it's already been validated on HyperShift).

            [HOSTEDCP-613] HyperShift operator release process validation and documentation

            Antoni Segura Puimedon added a comment - Documented in https://docs.google.com/document/d/1pLN7Rekuee3KGcMjnwumgA0QOsJCF2GeirHsmj8ikM4/edit#heading=h.23xqy0q8f8zg

            agarcial@redhat.com  What is the status of this item? Is it ready to be closed?

            cc asegurap1@redhat.com 

            Demethria Ramseur added a comment - agarcial@redhat.com   What is the status of this item? Is it ready to be closed? cc asegurap1@redhat.com  

            Alberto Garcia Lamela added a comment - Started to automate and document / formalise current process and conventions here  https://docs.google.com/document/d/1pLN7Rekuee3KGcMjnwumgA0QOsJCF2GeirHsmj8ikM4/edit and  https://github.com/openshift/hypershift/pull/2236 See also  https://redhat-internal.slack.com/archives/G01QS0P2F6W/p1677686587973519

            agarcial@redhat.com - This is in progress, correct? Asking so the status can be updated.

            Demethria Ramseur added a comment - agarcial@redhat.com - This is in progress, correct? Asking so the status can be updated.

            I wouldn't say so, 1990 is a parallel effort for a better implementation approach.

            In my mind the outcome of this ticket at this point would be the formalisation of a process in a common doc or so, including but not limited to the following items:

            Shipping changes into prod post GA.

            • How often are we planning to ship to each environment.
            • Define the gate to move to next environment.
            • QE acknowledge if the plan is to rely on them to promote between envs.
            • Capture time for a commit dev->prod (e.g. 3 weeks).

            Shipping changes during the release dev cycle.

            • How often are we planning to ship to each environment.
            • Define the gate to move to next environment.
            • QE acknowledge if the plan is to rely on them to promote between envs during the dev cycle.
            • Capture time for a commit dev->stage (e.g. 2 weeks) for devs to get feedback from QE and so to be able to get feedback and close the ticket.

             

            Alberto Garcia Lamela added a comment - I wouldn't say so, 1990 is a parallel effort for a better implementation approach. In my mind the outcome of this ticket at this point would be the formalisation of a process in a common doc or so, including but not limited to the following items: Shipping changes into prod post GA. How often are we planning to ship to each environment. Define the gate to move to next environment. QE acknowledge if the plan is to rely on them to promote between envs. Capture time for a commit dev->prod (e.g. 3 weeks). Shipping changes during the release dev cycle. How often are we planning to ship to each environment. Define the gate to move to next environment. QE acknowledge if the plan is to rely on them to promote between envs during the dev cycle. Capture time for a commit dev->stage (e.g. 2 weeks) for devs to get feedback from QE and so to be able to get feedback and close the ticket.  

            patmarti@redhat.com Is this just pending a LGTM to merge PR 1990?
            cc agarcial@redhat.com

            Demethria Ramseur added a comment - patmarti@redhat.com Is this just pending a LGTM to merge PR 1990? cc agarcial@redhat.com

            "a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors"

            this is where I can help.  Once in app-interface, I'd like to help with agarcial@redhat.com 's suggestion:  "They can also access OCM to create HC if that's really what we want."

            I'd like to set up the canaries and rings and all that, and I think what you two are suggesting would allow us to do that. 

            Mark Turansky added a comment - "a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors" this is where I can help.  Once in app-interface, I'd like to help with agarcial@redhat.com 's suggestion:  "They can also access OCM to create HC if that's really what we want." I'd like to set up the canaries and rings and all that, and I think what you two are suggesting would allow us to do that. 

            Patrick Martin added a comment - - edited

            Let's not mix too many things in here mturansk. I think addon progressive rollout, while planned for this quarter and needed for other purpose, will not bring much for hypershift operator.

            Hypershift operator is not an OCM addon itself. The current deployment model is to deploy it as an ACM addon. To allow fast fixes rollouts, an "overrride" configmap can be managed by OSDFM to set the hypershift image and image-tag.
            So we end up with

            • team interdependencies: hypershift team needs to talk to either ACM and/or OSDFM to deploy something
            • progressive rollout challenges for overrides

            My proposal in PR 1990 is to give back autonnomy to the hypershift team by relying on as standard as possible app-interface saas deployment mechanisms, allowing progressive rollout:

            • hypershift-operator repo generates/contains a template of a SelectorSyncSet which everything that's needed for hypershift operator to run, except specific configs to be delivered via secrets
            • a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors
            • gated (and automated) promotions through environment and sectors with standard validation Jobs defined in app-interface saas-files. What those jobs do is to be defined, as Alberto mentioned. They can check RHOBS metrics, possibly access management clusters if they run for example on the backplane clusters. They can also access OCM to create HC if that's really what we want.

            Such a SSS deployment relies on:

            • Labels on Hive ClusterDeployment for the cluster type (we have that already) and the sector (the current PR relies on a region label, but can be changed to whatever)
            • Some data in secrets for the AWS region and external-dns txt-owner-id. This data should be provided by OSD FM at setup time only, via CS SyncSet API. OSD FM already provides this info to ACM today here.
              • we can optionally set external-dns domain-filter via a secret as well, since the set of Route53 could be changed by an other component like CS
            • ACM still needs to register the management clusters as hypershift clusters to enable auto-registration of HostedClusters. I believe that is fine with the current configuration.
            • and the validation job definition mentioned above.

            Patrick Martin added a comment - - edited Let's not mix too many things in here mturansk . I think addon progressive rollout, while planned for this quarter and needed for other purpose, will not bring much for hypershift operator. Hypershift operator is not an OCM addon itself. The current deployment model is to deploy it as an ACM addon. To allow fast fixes rollouts, an "overrride" configmap can be managed by OSDFM to set the hypershift image and image-tag. So we end up with team interdependencies: hypershift team needs to talk to either ACM and/or OSDFM to deploy something progressive rollout challenges for overrides My proposal in PR 1990 is to give back autonnomy to the hypershift team by relying on as standard as possible app-interface saas deployment mechanisms, allowing progressive rollout: hypershift-operator repo generates/contains a template of a SelectorSyncSet which everything that's needed for hypershift operator to run, except specific configs to be delivered via secrets a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors gated (and automated) promotions through environment and sectors with standard validation Jobs defined in app-interface saas-files. What those jobs do is to be defined, as Alberto mentioned. They can check RHOBS metrics, possibly access management clusters if they run for example on the backplane clusters. They can also access OCM to create HC if that's really what we want. Such a SSS deployment relies on: Labels on Hive ClusterDeployment for the cluster type (we have that already) and the sector (the current PR relies on a region label, but can be changed to whatever) Some data in secrets for the AWS region and external-dns txt-owner-id. This data should be provided by OSD FM at setup time only, via CS SyncSet API. OSD FM already provides this info to ACM today here . we can optionally set external-dns domain-filter via a secret as well, since the set of Route53 could be changed by an other component like CS ACM still needs to register the management clusters as hypershift clusters to enable auto-registration of HostedClusters. I believe that is fine with the current configuration . and the validation job definition mentioned above.

            This is good info, thanks folks.

            >I am not sure running a full e2e test of HC creation via OCM makes the most sense as it would be sensitive to other components behavior (eg CS or ACM glitch would fail the hypershift operator deployment?)

            This is literally the goal and the gate of this test, i.e. validate the components works together so we deliver a product. It's the most basic rosa black box smoke test. Components in isolation and API contracts are already tested in their own repos.

            I'd expect any commit of any software rolled out into an environment that belongs to the delivery pipeline that ends up in prod to be gated at the very minimum on rosa create/delete.

            Alberto Garcia Lamela added a comment - This is good info, thanks folks. >I am not sure running a full e2e test of HC creation via OCM makes the most sense as it would be sensitive to other components behavior (eg CS or ACM glitch would fail the hypershift operator deployment?) This is literally the goal and the gate of this test, i.e. validate the components works together so we deliver a product. It's the most basic rosa black box smoke test. Components in isolation and API contracts are already tested in their own repos. I'd expect any commit of any software rolled out into an environment that belongs to the delivery pipeline that ends up in prod to be gated at the very minimum on rosa create/delete.

            I agree with agarcial@redhat.com and that's my focus. I am working to make that happen.

            App-interface, however, does not provide ci/cd promotions and jobs and whatnot for AddOns.  Only for things we deploy on app-interface, like OCM, benefits from the ci/cd promotions.  Versions of OSD, AddOns, etc. do not.

            So how do we progressively deliver things on the clusters? 

            I think we're missing a piece like an AddOn Promoter that can mimic app-interface promotion functionality (see https://docs.google.com/document/d/1_vZx5WdUKzKjIOmOlCs0WEr9pARiqSXE-JJsmmrJ7Mc/edit#heading=h.70vpe4ijsm0u)  This requires a Sector implementation baked into AS, like this:  https://gitlab.cee.redhat.com/ocm/ocm-addons-service/-/merge_requests/133

            I think the data model in that MR is correct.  We can promote addons through sectors by POSTing to OCM.   patmarti@redhat.com i think you referenced something like this in your master spreadsheet of components. I believe there was a "what actor does this? OCM?" kind of lingering question.  I think the answer is yes  

             

            Mark Turansky added a comment - I agree with agarcial@redhat.com and that's my focus. I am working to make that happen. App-interface, however, does  not provide ci/cd promotions and jobs and whatnot for AddOns.  Only for things we deploy  on app-interface, like OCM, benefits from the ci/cd promotions.  Versions of OSD, AddOns, etc. do not. So how do we progressively deliver things  on the clusters?  I think we're missing a piece like an AddOn Promoter that can mimic app-interface promotion functionality (see https://docs.google.com/document/d/1_vZx5WdUKzKjIOmOlCs0WEr9pARiqSXE-JJsmmrJ7Mc/edit#heading=h.70vpe4ijsm0u )  This requires a Sector implementation baked into AS, like this:   https://gitlab.cee.redhat.com/ocm/ocm-addons-service/-/merge_requests/133 I think the data model in that MR is correct.  We can promote addons through sectors by POSTing to OCM.   patmarti@redhat.com i think you referenced something like this in your master spreadsheet of components. I believe there was a "what actor does this? OCM?" kind of lingering question.  I think the answer is yes    

              agarcial@redhat.com Alberto Garcia Lamela
              jeder@redhat.com Jeremy Eder
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: