Our CI jobs take a very long time – as our product gains more functionality, we are teetering on the edge of the 4 hour time limit, leaving us vulnerable in case of infra issue or other slowdown.
This spike is to investigate ways to speed up our CI jobs
Ideas to explore revolve around parallelizing tests:
- When testing each node, instead of looping over each node, spawn 1 thread for each node
- In proxy suite, don’t do 1 SSH for each variable and each service
- non-conflicting suites like storage, network can run simultaneously
- Make some subtests run in separate threads (aside from MachineSet node and BYOH node creation which conflict due to machine-approver setting)
Find other places to improve, including where and how to fail fast