Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Activity Type:
Future Sustainability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

We recently hit an issue where the micro-upgrade (nightly to nightly) tests were failing due to https://github.com/openshift/cluster-api/pull/253 . The implied signal of this test failure was that the test broke upgrades. However, in fact the opposite was true: it was reverting a prior upgrade which broke the release. It was therefore upgrade neutral.

In this case, the nightly to nightly test gave us an answer that was incorrect: although it would not be possible to revert the earlier change between 2 production releases, it was obviously permitted to have never made the change at all.

We were also asked to consider a new test which would have mitigated this issue. Fortunately there is existing guidance on the changes permitted in a single release which the original change did not follow. If we had followed it, we would have been able to revert the change safely, even between the nightlies. We are considering how to add a test to cover this.

However, as this test requires forcing an intermediate state, it only works when using a stable base for upgrade testing. For example:

Release A
Nightly A introduces permitted mutation
Nightly B introduces second mutation permitted from Nightly A but not Release A
Nightly C
Nightly D
Release B

If we are only testing nightly to nightly, we don't catch the upgrade breakage in step 3 until we do release to release testing in step 6.

If, however, we always do release to nightly testing it would have given us the right answer in both cases. https://github.com/openshift/cluster-api/pull/253 would have been permitted because it did not break upgrades from a release somebody might actually be running, and our proposed new test would catch the breakage above immediately in step 3, rather than at the last minute.

Ideally we would run both (nightly to nightly and release to nightly), as long as we're prepared to override nightly to nightly tests from time to time. However, as we would probably never override a release to nightly upgrade test failure, my gut feeling is that if we were only going to run one set of tests for capacity reasons, release to nightly might be the more interesting test.

Incidentally, we did not catch the release breakage at the time because the feature applies only to a specific vsphere configuration and not everything can run as a presubmit, but we did catch it the first time the periodics ran.

relates to

SHIPSTRAT-3 A successful nightly most nights

Refinement

Assignee:: Unassigned

Reporter:: Matthew Booth

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/12/11 6:57 PM

Updated:: 2026/01/05 8:11 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates