Loading...

XML

Word

Printable

Type: Epic
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Epic Name:
Impact of upgrade edge gating
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

As a member of the OTA team

I understand the impact of our new edge gating automation (graph-data#2059, example)

So that I can either improve it or stand in awe of it's awesomeness.

Background:

After a few years of managing the upgrade graph we've learned to be careful with how we introduce new edges. Without care we can lead customers to temporary "deadend" release. A deadend release is an errata release that would normally allow you to upgrade to the next minor but "surprisingly" does not yet. Consider the following fictitious example:

Current update recommendations are:
1. A.1 -> A.2
2. A.1 -> A.3
3. A.2 -> A.3
4. A.2 -> B.1
5. But no update recommendation from A.1 to B.1, because A.1 is missing a pre-update guard that was added in A.2.
6. And no update recommendation from A.3 to B.1, because A.3 includes a newer fix, which would regress when updating to B.1.
A customer installed version A.1. They want to upgrade to version B.
In order to get there they must first upgrade to a newer version of A since A.1 has no paths to B.
They upgrade to A.3 since that's the latest version of A available and the stable upgrades from A to B have been out for months.
Surprisingly there's no upgrade available from A.3 to B yet because, at the last minute, the engineering team discovered a problem in the latest B.2 release. The release was actually "tombstoned" or never released.

With the automation that has now landed (~~OTA-694~~) we've automated a previously-manual practice to avoid the scenario above. If a customer were to first change to the B channel before starting their upgrade we would instead upgrade them from A.1 to A.2. A.2 does have an upgrade path to a release in version B so they are just one more click away from getting there.

Challenges:

To analyze the historical upgrade graph, you can use show-edges.py. It's all in Git, so you can also get at it directly without going through Python.
It will be easy to make this problem much harder than it needs to be. To consider the impact of our automation, I believe we only need to look at upgrades where:
- The cluster was in a 4.y.z, in a *-4.(y+) channel (e.g. 4.6.50 in eus-4.8).
- The cluster updated to their 4.y's then-tip 4.y.z' (e.g. 4.6.60. Doesn't actually need to have been the tip, the dead-end could be multiple releases long, but in practice, it seems unlikely that folks are selecting specific non-tip dead-end targets).
- That target 4.y.z' had no update available to 4.(y+) at that time (so it was then a dead end).

Note:

This is a potential project for Maaz. I propose he discuss it with the team on the August 15th team meeting.

Definition of done:

Find out how many customers faced the issue of upgrading to the tip of a 4.y.z stream (before we changed the graph automation) and end up in a place where there are no upgrade path to 4.y+1. We can assume that customers in the 4.y+1 channel or 4.y+2 EUS channels has the intent of upgrading to the next minor version.
We should find out how many times we have added versions to the channels (as cincinnati-graph-data commits) where are no upgrade path to the next minor versions.

is caused by

OTA-694 stabilization-bot: do not include 4.(y-1).z and earlier until they can get to 4.y

Closed

Create table of temporary minor-update dead ends

Closed

Unassigned

Assignee:: Maaz Shaikh (Inactive)

Reporter:: Brenton Leanhardt

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/08/11 5:43 PM

Updated:: 2023/07/20 5:50 PM

Resolved:: 2023/07/20 5:50 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates