Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-2474

EC/Release to Nightly testing

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Future Sustainability
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      We recently hit an issue where the micro-upgrade (nightly to nightly) tests were failing due to https://github.com/openshift/cluster-api/pull/253 . The implied signal of this test failure was that the test broke upgrades. However, in fact the opposite was true: it was reverting a prior upgrade which broke the release. It was therefore upgrade neutral.

      In this case, the nightly to nightly test gave us an answer that was incorrect: although it would not be possible to revert the earlier change between 2 production releases, it was obviously permitted to have never made the change at all.

      We were also asked to consider a new test which would have mitigated this issue. Fortunately there is existing guidance on the changes permitted in a single release which the original change did not follow. If we had followed it, we would have been able to revert the change safely, even between the nightlies. We are considering how to add a test to cover this.

      However, as this test requires forcing an intermediate state, it only works when using a stable base for upgrade testing. For example:

      1. Release A
      2. Nightly A introduces permitted mutation
      3. Nightly B introduces second mutation permitted from Nightly A but not Release A
      4. Nightly C
      5. Nightly D
      6. Release B

      If we are only testing nightly to nightly, we don't catch the upgrade breakage in step 3 until we do release to release testing in step 6.

      If, however, we always do release to nightly testing it would have given us the right answer in both cases. https://github.com/openshift/cluster-api/pull/253 would have been permitted because it did not break upgrades from a release somebody might actually be running, and our proposed new test would catch the breakage above immediately in step 3, rather than at the last minute.

      Ideally we would run both (nightly to nightly and release to nightly), as long as we're prepared to override nightly to nightly tests from time to time. However, as we would probably never override a release to nightly upgrade test failure, my gut feeling is that if we were only going to run one set of tests for capacity reasons, release to nightly might be the more interesting test.

      Incidentally, we did not catch the release breakage at the time because the feature applies only to a specific vsphere configuration and not everything can run as a presubmit, but we did catch it the first time the periodics ran.

              Unassigned Unassigned
              rhn-gps-mbooth Matthew Booth
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: