Decomposed e2e tests
Workloads Sprint 236, Workloads Sprint 237, Workloads Sprint 238, Workloads Sprint 239, Workloads Sprint 240, Workloads Sprint 241, Workloads Sprint 242, Workloads Sprint 243, Workloads Sprint 244, Workloads Sprint 245
(Text copy pasted from https://docs.google.com/document/d/1OkE-h4d16lDvXaJpiNEYuv1215LgQ1VhzBsCq_nFHGc/edit#)
Teams are writing custom test frameworks for their component in their own repo (instead of contributing the tests to the centralized openshift/origin repo). This has the benefit of co-locating test & feature – enabling teams to merge both in a single PR, but means that the tests are not run on all PRs across the org, tracked by sippy, or run in multiple configurations in periodics (proxied, disconnected, ipv6), etc.
Today, for example, the rebase team opens a PR against openshift/kubernetes with kube 1.26 content while openshift/origin contains kube 1.25 e2e tests. The openshift-tests binary from a 1.25 kube is therefore driving the vast majority of testing against the rebase PR. When the rebase PR finally merges, there is a challenge period where the rebase team must subsequently land a 1.26 test suite update PR in openshift/origin. This can take awhile since the 1.26 combination was not exhaustively exercised on all platforms by the initial openshift/kubernetes PR.
Ideally, the kube 1.26 PR to openshift/kubernetes could also contain the 1.26 e2e tests that should run against it. Abstracting this across all teams, it would mean that each team could house e2e tests in their own repo that ultimately run, in aggregate, as part of a suite replacing the historically monolithic openshift-tests.
Non-rebase Team Use case:
As a member of the etcd team, I want to open a single PR in openshift/etcd which contains a bump of my upstream component and all e2e tests which should run against a cluster running the new version. Pre-merge, the e2e tests in my PR replace any existing etcd e2e tests committed in the etcd repo. e2e suites corresponding to component builds in my release payload from other OpenShift repos run against my ephemeral clusters under test. When my PR merges and my image is promoted, the new etcd e2e suite content will be associated with subsequent release payloads and run when other PRs are opened across the organization.
Implementation details (Runtime Assembly / Single Pod)
Test binaries are built into each release payload component image at conventionally established location; e.g. "/openshift-tests/<component-name>" . Once a cluster has been installed, the openshfit-tests binary inspects the components of the running cluster's payload and extracts binaries to local storage and executes them. openshift-tests aggregates the results of the binaries being executed.
Where does the openshift-test image come from?
If it came from the payload under test, we’d already have to have the openshift-tests run on nodes with an architecture that matches
CI operator pulls the openshift-tests image from a different location.
Have we tried to run ARM jobs on ARM build farms
Nikolaos - we tried and failed, need to get information about what failed.
Could we accept that arm is lagging/broken during kube updates for the first iteration?
We’d run these external tests if we can find them and local tests if we cannot
Arm would lag for weeks behind amd, but we’d get the amd velocity bump
An amd64 build farm cluster cannot run release payload component pods from an ARM cluster. We might be able to mitigate this if we only ran these pods on the ARM build farm.
May require statically compiled binaries to allow for component e2e binary to run on a different version of RHEL than openshift-tests. Image size implications?
Need to run test pod on the CI cluster on a node with the same arch as the cluster under test.
It’s “easy” to have a PR add a new test and that test will run in pre-merge testing.
It’s easier to ensure test randomization and interleaving from different sigs if one binary is orchestrating everything
It’s easier to implement, since the coordination mechanism will exist only inside openshift/origin repository
Possible to evolve this into the Runtime Assembly / Multi-Pod.
Notes to myself:
- each test suite is going to be accessible through the ginkgo framework (or a generic framework built on top of the ginkgo framework)
- the to-be-implemented wrapper will pull all images with a test suite binary built from the mentioned frameworks and run the individual tests (based on some spec/configuration)
- Maciej's proposal: https://github.com/openshift/enhancements/pull/1291