Type: Epic
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Labels:
- e2e
- tests

Epic Name:
Decomposed e2e tests
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do
Size:
L

Sprint:
Workloads Sprint 236, Workloads Sprint 237, Workloads Sprint 238, Workloads Sprint 239, Workloads Sprint 240, Workloads Sprint 241, Workloads Sprint 242, Workloads Sprint 243, Workloads Sprint 244, Workloads Sprint 245, Workloads Sprint 246, Workloads Sprint 247, Workloads Sprint 248

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Intro

(Text copy pasted from https://docs.google.com/document/d/1OkE-h4d16lDvXaJpiNEYuv1215LgQ1VhzBsCq_nFHGc/edit#)

Teams are writing custom test frameworks for their component in their own repo (instead of contributing the tests to the centralized openshift/origin repo). This has the benefit of co-locating test & feature – enabling teams to merge both in a single PR, but means that the tests are not run on all PRs across the org, tracked by sippy, or run in multiple configurations in periodics (proxied, disconnected, ipv6), etc.

Today, for example, the rebase team opens a PR against openshift/kubernetes with kube 1.26 content while openshift/origin contains kube 1.25 e2e tests. The openshift-tests binary from a 1.25 kube is therefore driving the vast majority of testing against the rebase PR. When the rebase PR finally merges, there is a challenge period where the rebase team must subsequently land a 1.26 test suite update PR in openshift/origin. This can take awhile since the 1.26 combination was not exhaustively exercised on all platforms by the initial openshift/kubernetes PR.

Ideally, the kube 1.26 PR to openshift/kubernetes could also contain the 1.26 e2e tests that should run against it. Abstracting this across all teams, it would mean that each team could house e2e tests in their own repo that ultimately run, in aggregate, as part of a suite replacing the historically monolithic openshift-tests.

Non-rebase Team Use case

As a member of the etcd team, I want to open a single PR in openshift/etcd which contains a bump of my upstream component and all e2e tests which should run against a cluster running the new version. Pre-merge, the e2e tests in my PR replace any existing etcd e2e tests committed in the etcd repo. e2e suites corresponding to component builds in my release payload from other OpenShift repos run against my ephemeral clusters under test. When my PR merges and my image is promoted, the new etcd e2e suite content will be associated with subsequent release payloads and run when other PRs are opened across the organization.

Implementation details (Runtime Assembly / Single Pod)

Test binaries are built into each release payload component image at conventionally established location; e.g. "/openshift-tests/<component-name>" . Once a cluster has been installed, the openshfit-tests binary inspects the components of the running cluster's payload and extracts binaries to local storage and executes them. openshift-tests aggregates the results of the binaries being executed.

Questions

Where does the openshift-test image come from?
- If it came from the payload under test, we’d already have to have the openshift-tests run on nodes with an architecture that matches
- CI operator pulls the openshift-tests image from a different location.
Have we tried to run ARM jobs on ARM build farms
- Nikolaos - we tried and failed, need to get information about what failed.
Could we accept that arm is lagging/broken during kube updates for the first iteration?
- We’d run these external tests if we can find them and local tests if we cannot
- Arm would lag for weeks behind amd, but we’d get the amd velocity bump

Cons

An amd64 build farm cluster cannot run release payload component pods from an ARM cluster. We might be able to mitigate this if we only ran these pods on the ARM build farm.
May require statically compiled binaries to allow for component e2e binary to run on a different version of RHEL than openshift-tests. Image size implications?
Need to run test pod on the CI cluster on a node with the same arch as the cluster under test.

Pros

It’s “easy” to have a PR add a new test and that test will run in pre-merge testing.
It’s easier to ensure test randomization and interleaving from different sigs if one binary is orchestrating everything
It’s easier to implement, since the coordination mechanism will exist only inside openshift/origin repository
Possible to evolve this into the Runtime Assembly / Multi-Pod.

Notes:

each test suite is going to be accessible through the ginkgo framework (or a generic framework built on top of the ginkgo framework)
the to-be-implemented wrapper will pull all images with a test suite binary built from the mentioned frameworks and run the individual tests (based on some spec/configuration)
Maciej's proposal: https://github.com/openshift/enhancements/pull/1291

Links

TODO:

onboard another repository and produce an image with the openshift-tests binary (e.g. for the secondary scheduler operator): learning experience for what needs to be set in place + writing down the instructions for newbies
sync with the release team (or check https://docs.ci.openshift.org/docs/how-tos/onboarding-a-new-component/) on how to build a new operator/operand image for optional components after each PR merges so the new image is available for optional e2es running over repositories whose image are part of the core payload. We use cpass for some optional operators. We need something that gets built right way and made available (even if it means the CI will not test images for production). Is this even possible? What about KonFlux? This might require broader survey on which repositories can be included in the testing framework.
extend the repository with a new ci job combining the upstream k8s tests and the new image
write a blog post about the experience with guiding steps on how to onboard other repositories, resp. extend https://github.com/openshift/enhancements/blob/master/enhancements/testing/extended-platform-tests.md.
during the process of onboarding the new repository identify missing pieces and suggest improvements of the current testing framework

Details

Description

Intro

Non-rebase Team Use case

Implementation details (Runtime Assembly / Single Pod)

Questions

Cons

Pros

Links

Attachments

Easy Agile Planning Poker

Activity

People

Dates