Loading...

XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
- 4.Next-candidate

Epic Name:
ClusterPools + MachinePools: Rethink the whole thing
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

For reasons lost to history (read: I haven't dug into it yet) the ClusterPool controller creates a MachinePool with each pool cluster by default. This can be optionally disabled via ClusterPool.spec.skipMachinePools.

The problem is that the MachinePool we generate has hardcoded values that will only match what's in the default worker pool in the install-config by coincidence. If anything in the install-config is nonstandard, you can end up in a situation where your spoke MachineSets will represent some non-intuitive combination of that and the hardcoded MachinePool values:

If your install-config's instance type doesn't match the one we hardcode in the MachinePool (e.g. m5.xlarge for AWS), the install-config will win... unless Machines are deleted in which case MAPI will recreate them with the MachinePool's instance type. Similar for other things under platform.
We hardcode replicas to 3, meaning whatever replica count you have in your install-config is effectively ignored.

We do properly set up RBAC for the owner of the ClusterClaim, so that user would be able to edit the MachinePool – but we only allow editing the replica count!

Bringing the matter more to the foreground, ACM is starting to consume both ClusterPool and MachinePool, so we need to be considering their UX with whatever we decide here.

We also need to consider that MAPI is on the way out – but CAPI isn't here yet – so we need to be careful not to invest too much in making the existing thing perfect; and we can't expect MAPI changes.

Initial thoughts:

Consider changing to default skipMachinePools: true. This is technically a breaking change, so it will need to be thought out carefully. But I think it's pretty safe given the limitations of what's happening today.
Populate MachinePool based on values from install-config. This is where we want to be careful about how much we're investing, as this will be a nontrivial amount of work – both code and test surface.

Here's the thread that prompted this.

is related to

ACM-9772 MachinePool for ARM64 Cluster Claim has the wrong instance type

Assignee:: Unassigned

Reporter:: Eric Fried

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/27 7:31 PM

Updated:: 2024/02/21 8:42 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates