-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Spot instances are way cheaper. Like 70-90% cheaper. Downside: they can theoretically get yanked at any time. In practice, though, this tends to happen <1%/24h (according to James Russell). So for CI e2e jobs, which last O(2h), the likelihood is very small... and as long as we can distinguish flakes that happen for this reason, even manual /retest ing makes this worth the savings if it's anywhere close to the above estimates.
Links from James:
- AWS Spot Instances - account operators guide
- Spot Instance and cost effecient cluster usage techniques - Technical Enablement
Hive already has some accommodation for spot instances in MachinePools and the hibernation controller, so it should be possible to request spot instances through... the install-config?
Problem for ClusterPools!
I believe we terminate (delete) spot instances on hibernation, and let MAPI recreate them when we resume the cluster. But when we create the default MachinePool to go with a ClusterPool cluster, we're not copying out (the relevant portions of) the install-config. So upon resume, I believe we'll end up creating the wrong instance types. See HIVE-2256 for more background on this. But I think what it means is that spot instances + clusterpools is dead in the water until we can address that card. Which by extension means:
- We can't reasonably cut our OSCI clusterpools over to spot instances until (at least this part of) HIVE-2256 is addressed.
- e2e-pool. If we use spot instances for the pool we create inside the test... it might run. It might even "succeed". But at some point in there it'll end up recreating on-demand instances. This would at best reduce our cost savings.