Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
- mco-tech-debt

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Cost of Delay:
0
WSJF:
0

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

There are certain flows within the Machine Config Daemon that are only executed whenever an upgrade is in progress. An example of this is the SSH key location migration flow in https://github.com/openshift/machine-config-operator/pull/3534. Testing and verifying that these flows work as intended is currently a manual, error-prone, and time-consuming process.

A reason why this is so difficult is because today we cannot perform an offline test of the MCD. We depend on having an MCD in a live cluster to test against. If it were possible to connect the MCD to a fake control plane (https://issues.redhat.com/browse/MCO-433), have it write to a fake filesystem (https://issues.redhat.com/browse/MCO-439), and mock its calls to system binaries (https://issues.redhat.com/browse/MCO-500), this would be moot since we could easily simulate upgrade scenarios and get much faster feedback with substantially less CI resource usage. Enabling that would require a non-trivial amount of work. In lieu of that, we may want to consider MCO-specific upgrade test jobs in CI.

While we have the OpenShift e2e upgrade tests, that suite is only concerned about the holistic state of the cluster while the upgrade is in progress as well as once the cluster has finished upgrading. It is not (and should not be) concerned with the implementation details of individual OpenShift components. While we could add tests to that suite, they would go against the spirit of the OpenShift e2e test suite since our needs are much more granular.

Implementation Details:

Within OpenShift CI, it is possible to bring up a cluster that is upgrading to the current release and either wait for it to complete or run some arbitrary tests against it while the upgrade is in progress.
For tests where we want to examine the clusters' post-upgrade state, it might make sense to wait until the current OpenShift e2e upgrade suite is completed and then run our tests afterward to avoid spinning up another cluster. For example, the test I have in mind for https://github.com/openshift/machine-config-operator/pull/3534, would validate that the SSH keys changed location and would be a rather quick check at the end of that test suite. With this in mind, these tests could piggyback on the preexisting upgrade test jobs and would only need a new Makefile target (as well as the addition of another step to our CI config to run them).
I cannot imagine a scenario where we might need to monitor an in-progress cluster upgrade to infer state about the MCO's upgrade code paths, but I cannot completely exclude that possibility. We'll need to add an MCO-specific cluster upgrade job. However, that particular upgrade job might not need to run on each PR, but could be an opt-in for cases where we know we've made a change to the upgrade code paths.
These tests should live in a separate folder within the MCO repo; e.g., test/e2e-post-upgrade, test/e2e-upgrade-in-progress, and should have separate Makefile targets (e.g., $ make test-e2e-post-upgrade or $ make test-e2e-upgrade-in-progress) or similar.

Done When:

It is determined whether we should consider an MCO-specific upgrade test suite.
We have an idea of what tests we'd like to run and whether those tests depend upon an in-progress cluster upgrade or whether we can run them at the end of a cluster upgrade.
We've added the necessary CI config, Makefile targets, etc., and have at least one MCO upgrade test.

Assignee:: Unassigned

Reporter:: Zack Zlotnik

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2023/02/17 11:22 PM

Updated:: 2024/02/14 8:03 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates