Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
CI cost savings
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Spot instances are way cheaper. Like 70-90% cheaper. Downside: they can theoretically get yanked at any time. In practice, though, this tends to happen <1%/24h (according to James Russell). So for CI e2e jobs, which last O(2h), the likelihood is very small... and as long as we can distinguish flakes that happen for this reason, even manual /retest ing makes this worth the savings if it's anywhere close to the above estimates.

Links from James:

Hive already has some accommodation for spot instances in MachinePools and the hibernation controller, so it should be possible to request spot instances through... the install-config?

Problem for ClusterPools!

I believe we terminate (delete) spot instances on hibernation, and let MAPI recreate them when we resume the cluster. But when we create the default MachinePool to go with a ClusterPool cluster, we're not copying out (the relevant portions of) the install-config. So upon resume, I believe we'll end up creating the wrong instance types. See HIVE-2256 for more background on this. But I think what it means is that spot instances + clusterpools is dead in the water until we can address that card. Which by extension means:

We can't reasonably cut our OSCI clusterpools over to spot instances until (at least this part of) HIVE-2256 is addressed.
e2e-pool. If we use spot instances for the pool we create inside the test... it might run. It might even "succeed". But at some point in there it'll end up recreating on-demand instances. This would at best reduce our cost savings.

blocks

RFE-5402 Enable Hive to support Spot Instances within Cluster Pools For Significantly Reduced AWS OpEx

Under Review

relates to

HIVE-2488 Support Spot Instances in Hive Cluster Pools to reduce AWS OpEx

Closed

links to

openshift/hive#2214: DNM: PoC Spot Instances for masters and workers

Assignee:: Eric Fried

Reporter:: Eric Fried

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/02/13 6:43 PM

Updated:: 2025/01/13 5:59 PM

Details

Description

Problem for ClusterPools!

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates