[HOSTEDCP-613] HyperShift operator release process validation and documentation

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- sd-hypershift

Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
SDE-2308

WSJF:
0
Risk Score:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

As a ROSA-HyperShift service owner, I want each component owner to demonstrate, document and confirm that they can safely release changes at least once per day.

Component owners demonstrate, document and confirm that they can safely release changes at least once per day.
The role of QE is documented, along with their expected path forward (tests fully automated).
Enumerate gaps and have an action plan if you cannot release at least once per day.
Review and confirm correctness of your row in this spreadsheet https://docs.google.com/spreadsheets/d/18Zee-5Q2-s66Oh7xGlwLkMvaUuGdlfbkwLQVYFosBH8
Provide a link to your validated release procedure documentation (OK to point to existing docs, if it's already been validated on HyperShift).

is blocked by

ACM-2155 Hypershift operator continuous delivery to Service Delivery

Closed

HOSTEDCP-633 Productised CD pipeline for hypershift operator

Closed

links to

openshift/hypershift#1990: Proposal: generate a hypershift-operator selectorsyncset template

Antoni Segura Puimedon added a comment - 2023/06/06 4:41 PM

Documented in https://docs.google.com/document/d/1pLN7Rekuee3KGcMjnwumgA0QOsJCF2GeirHsmj8ikM4/edit#heading=h.23xqy0q8f8zg

Antoni Segura Puimedon added a comment - 2023/06/06 4:41 PM Documented in https://docs.google.com/document/d/1pLN7Rekuee3KGcMjnwumgA0QOsJCF2GeirHsmj8ikM4/edit#heading=h.23xqy0q8f8zg

Demethria Ramseur added a comment - 2023/05/24 12:31 PM

agarcial@redhat.com What is the status of this item? Is it ready to be closed?

cc asegurap1@redhat.com

Demethria Ramseur added a comment - 2023/05/24 12:31 PM agarcial@redhat.com What is the status of this item? Is it ready to be closed? cc asegurap1@redhat.com

Alberto Garcia Lamela added a comment - 2023/03/06 8:03 AM

Started to automate and document / formalise current process and conventions here https://docs.google.com/document/d/1pLN7Rekuee3KGcMjnwumgA0QOsJCF2GeirHsmj8ikM4/edit

and https://github.com/openshift/hypershift/pull/2236

See also https://redhat-internal.slack.com/archives/G01QS0P2F6W/p1677686587973519

Alberto Garcia Lamela added a comment - 2023/03/06 8:03 AM Started to automate and document / formalise current process and conventions here https://docs.google.com/document/d/1pLN7Rekuee3KGcMjnwumgA0QOsJCF2GeirHsmj8ikM4/edit and https://github.com/openshift/hypershift/pull/2236 See also https://redhat-internal.slack.com/archives/G01QS0P2F6W/p1677686587973519

Demethria Ramseur added a comment - 2023/03/01 10:02 PM

agarcial@redhat.com - This is in progress, correct? Asking so the status can be updated.

Demethria Ramseur added a comment - 2023/03/01 10:02 PM agarcial@redhat.com - This is in progress, correct? Asking so the status can be updated.

Alberto Garcia Lamela added a comment - 2023/02/28 8:14 AM

I wouldn't say so, 1990 is a parallel effort for a better implementation approach.

In my mind the outcome of this ticket at this point would be the formalisation of a process in a common doc or so, including but not limited to the following items:

Shipping changes into prod post GA.

How often are we planning to ship to each environment.
Define the gate to move to next environment.
QE acknowledge if the plan is to rely on them to promote between envs.
Capture time for a commit dev->prod (e.g. 3 weeks).

Shipping changes during the release dev cycle.

How often are we planning to ship to each environment.
Define the gate to move to next environment.
QE acknowledge if the plan is to rely on them to promote between envs during the dev cycle.
Capture time for a commit dev->stage (e.g. 2 weeks) for devs to get feedback from QE and so to be able to get feedback and close the ticket.

Alberto Garcia Lamela added a comment - 2023/02/28 8:14 AM I wouldn't say so, 1990 is a parallel effort for a better implementation approach. In my mind the outcome of this ticket at this point would be the formalisation of a process in a common doc or so, including but not limited to the following items: Shipping changes into prod post GA. How often are we planning to ship to each environment. Define the gate to move to next environment. QE acknowledge if the plan is to rely on them to promote between envs. Capture time for a commit dev->prod (e.g. 3 weeks). Shipping changes during the release dev cycle. How often are we planning to ship to each environment. Define the gate to move to next environment. QE acknowledge if the plan is to rely on them to promote between envs during the dev cycle. Capture time for a commit dev->stage (e.g. 2 weeks) for devs to get feedback from QE and so to be able to get feedback and close the ticket.

Demethria Ramseur added a comment - 2023/02/27 7:27 PM

patmarti@redhat.com Is this just pending a LGTM to merge PR 1990?
cc agarcial@redhat.com

Demethria Ramseur added a comment - 2023/02/27 7:27 PM patmarti@redhat.com Is this just pending a LGTM to merge PR 1990? cc agarcial@redhat.com

Mark Turansky added a comment - 2023/01/05 2:16 PM

"a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors"

this is where I can help. Once in app-interface, I'd like to help with agarcial@redhat.com 's suggestion: "They can also access OCM to create HC if that's really what we want."

I'd like to set up the canaries and rings and all that, and I think what you two are suggesting would allow us to do that.

Mark Turansky added a comment - 2023/01/05 2:16 PM "a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors" this is where I can help. Once in app-interface, I'd like to help with agarcial@redhat.com 's suggestion: "They can also access OCM to create HC if that's really what we want." I'd like to set up the canaries and rings and all that, and I think what you two are suggesting would allow us to do that.

Patrick Martin added a comment - 2023/01/05 9:21 AM - edited

Let's not mix too many things in here mturansk. I think addon progressive rollout, while planned for this quarter and needed for other purpose, will not bring much for hypershift operator.

Hypershift operator is not an OCM addon itself. The current deployment model is to deploy it as an ACM addon. To allow fast fixes rollouts, an "overrride" configmap can be managed by OSDFM to set the hypershift image and image-tag.
So we end up with

team interdependencies: hypershift team needs to talk to either ACM and/or OSDFM to deploy something
progressive rollout challenges for overrides

My proposal in PR 1990 is to give back autonnomy to the hypershift team by relying on as standard as possible app-interface saas deployment mechanisms, allowing progressive rollout:

hypershift-operator repo generates/contains a template of a SelectorSyncSet which everything that's needed for hypershift operator to run, except specific configs to be delivered via secrets
a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors
gated (and automated) promotions through environment and sectors with standard validation Jobs defined in app-interface saas-files. What those jobs do is to be defined, as Alberto mentioned. They can check RHOBS metrics, possibly access management clusters if they run for example on the backplane clusters. They can also access OCM to create HC if that's really what we want.

Such a SSS deployment relies on:

Labels on Hive ClusterDeployment for the cluster type (we have that already) and the sector (the current PR relies on a region label, but can be changed to whatever)
Some data in secrets for the AWS region and external-dns txt-owner-id. This data should be provided by OSD FM at setup time only, via CS SyncSet API. OSD FM already provides this info to ACM today here.
- we can optionally set external-dns domain-filter via a secret as well, since the set of Route53 could be changed by an other component like CS
ACM still needs to register the management clusters as hypershift clusters to enable auto-registration of HostedClusters. I believe that is fine with the current configuration.
and the validation job definition mentioned above.

Patrick Martin added a comment - 2023/01/05 9:21 AM - edited Let's not mix too many things in here mturansk . I think addon progressive rollout, while planned for this quarter and needed for other purpose, will not bring much for hypershift operator. Hypershift operator is not an OCM addon itself. The current deployment model is to deploy it as an ACM addon. To allow fast fixes rollouts, an "overrride" configmap can be managed by OSDFM to set the hypershift image and image-tag. So we end up with team interdependencies: hypershift team needs to talk to either ACM and/or OSDFM to deploy something progressive rollout challenges for overrides My proposal in PR 1990 is to give back autonnomy to the hypershift team by relying on as standard as possible app-interface saas deployment mechanisms, allowing progressive rollout: hypershift-operator repo generates/contains a template of a SelectorSyncSet which everything that's needed for hypershift operator to run, except specific configs to be delivered via secrets a set of app-interface saas-files to configure the deployment of that template onto all hive shards, accross environments and defined sectors gated (and automated) promotions through environment and sectors with standard validation Jobs defined in app-interface saas-files. What those jobs do is to be defined, as Alberto mentioned. They can check RHOBS metrics, possibly access management clusters if they run for example on the backplane clusters. They can also access OCM to create HC if that's really what we want. Such a SSS deployment relies on: Labels on Hive ClusterDeployment for the cluster type (we have that already) and the sector (the current PR relies on a region label, but can be changed to whatever) Some data in secrets for the AWS region and external-dns txt-owner-id. This data should be provided by OSD FM at setup time only, via CS SyncSet API. OSD FM already provides this info to ACM today here . we can optionally set external-dns domain-filter via a secret as well, since the set of Route53 could be changed by an other component like CS ACM still needs to register the management clusters as hypershift clusters to enable auto-registration of HostedClusters. I believe that is fine with the current configuration . and the validation job definition mentioned above.

Alberto Garcia Lamela added a comment - 2023/01/05 9:04 AM

This is good info, thanks folks.

>I am not sure running a full e2e test of HC creation via OCM makes the most sense as it would be sensitive to other components behavior (eg CS or ACM glitch would fail the hypershift operator deployment?)

This is literally the goal and the gate of this test, i.e. validate the components works together so we deliver a product. It's the most basic rosa black box smoke test. Components in isolation and API contracts are already tested in their own repos.

I'd expect any commit of any software rolled out into an environment that belongs to the delivery pipeline that ends up in prod to be gated at the very minimum on rosa create/delete.

Alberto Garcia Lamela added a comment - 2023/01/05 9:04 AM This is good info, thanks folks. >I am not sure running a full e2e test of HC creation via OCM makes the most sense as it would be sensitive to other components behavior (eg CS or ACM glitch would fail the hypershift operator deployment?) This is literally the goal and the gate of this test, i.e. validate the components works together so we deliver a product. It's the most basic rosa black box smoke test. Components in isolation and API contracts are already tested in their own repos. I'd expect any commit of any software rolled out into an environment that belongs to the delivery pipeline that ends up in prod to be gated at the very minimum on rosa create/delete.

Mark Turansky added a comment - 2023/01/04 6:44 PM

I agree with agarcial@redhat.com and that's my focus. I am working to make that happen.

App-interface, however, does not provide ci/cd promotions and jobs and whatnot for AddOns. Only for things we deploy on app-interface, like OCM, benefits from the ci/cd promotions. Versions of OSD, AddOns, etc. do not.

So how do we progressively deliver things on the clusters?

I think we're missing a piece like an AddOn Promoter that can mimic app-interface promotion functionality (see https://docs.google.com/document/d/1_vZx5WdUKzKjIOmOlCs0WEr9pARiqSXE-JJsmmrJ7Mc/edit#heading=h.70vpe4ijsm0u) This requires a Sector implementation baked into AS, like this: https://gitlab.cee.redhat.com/ocm/ocm-addons-service/-/merge_requests/133

I think the data model in that MR is correct. We can promote addons through sectors by POSTing to OCM. patmarti@redhat.com i think you referenced something like this in your master spreadsheet of components. I believe there was a "what actor does this? OCM?" kind of lingering question. I think the answer is yes

Mark Turansky added a comment - 2023/01/04 6:44 PM I agree with agarcial@redhat.com and that's my focus. I am working to make that happen. App-interface, however, does not provide ci/cd promotions and jobs and whatnot for AddOns. Only for things we deploy on app-interface, like OCM, benefits from the ci/cd promotions. Versions of OSD, AddOns, etc. do not. So how do we progressively deliver things on the clusters? I think we're missing a piece like an AddOn Promoter that can mimic app-interface promotion functionality (see https://docs.google.com/document/d/1_vZx5WdUKzKjIOmOlCs0WEr9pARiqSXE-JJsmmrJ7Mc/edit#heading=h.70vpe4ijsm0u ) This requires a Sector implementation baked into AS, like this: https://gitlab.cee.redhat.com/ocm/ocm-addons-service/-/merge_requests/133 I think the data model in that MR is correct. We can promote addons through sectors by POSTing to OCM. patmarti@redhat.com i think you referenced something like this in your master spreadsheet of components. I believe there was a "what actor does this? OCM?" kind of lingering question. I think the answer is yes

Assignee:: Alberto Garcia Lamela

Reporter:: Jeremy Eder

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Due:: 2023/03/04

Created:: 2022/10/25 4:56 PM

Updated:: 2025/03/30 5:18 PM

Resolved:: 2023/06/06 4:41 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Antoni Segura Puimedon added a comment - 2023/06/06 4:41 PM

Expand comment: Antoni Segura Puimedon added a comment - 2023/06/06 4:41 PM

Collapse comment: Demethria Ramseur added a comment - 2023/05/24 12:31 PM

Expand comment: Demethria Ramseur added a comment - 2023/05/24 12:31 PM

Collapse comment: Alberto Garcia Lamela added a comment - 2023/03/06 8:03 AM

Expand comment: Alberto Garcia Lamela added a comment - 2023/03/06 8:03 AM

Collapse comment: Demethria Ramseur added a comment - 2023/03/01 10:02 PM

Expand comment: Demethria Ramseur added a comment - 2023/03/01 10:02 PM

Collapse comment: Alberto Garcia Lamela added a comment - 2023/02/28 8:14 AM

Expand comment: Alberto Garcia Lamela added a comment - 2023/02/28 8:14 AM

Collapse comment: Demethria Ramseur added a comment - 2023/02/27 7:27 PM

Expand comment: Demethria Ramseur added a comment - 2023/02/27 7:27 PM

Collapse comment: Mark Turansky added a comment - 2023/01/05 2:16 PM

Expand comment: Mark Turansky added a comment - 2023/01/05 2:16 PM

Collapse comment: Patrick Martin added a comment - 2023/01/05 9:21 AM, Edited by Patrick Martin - 2023/01/05 9:22 AM

Expand comment: Patrick Martin added a comment - 2023/01/05 9:21 AM, Edited by Patrick Martin - 2023/01/05 9:22 AM

Collapse comment: Alberto Garcia Lamela added a comment - 2023/01/05 9:04 AM

Expand comment: Alberto Garcia Lamela added a comment - 2023/01/05 9:04 AM

Collapse comment: Mark Turansky added a comment - 2023/01/04 6:44 PM

Expand comment: Mark Turansky added a comment - 2023/01/04 6:44 PM

People

Dates