Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

The syncMachineSets in the MachinePool controller is going to be somewhat inefficient at scale, as it will iterate #remoteMS * #generatedMS times. #generatedMS should always be fairly small given we're reconciling a single MachinePool – usually at most equal to the number of AZs (aka failure domains) on the spoke region. However, as currently written, #remoteMS is all the MachineSets on the spoke, which is generally O(#mpools * #msPerPool). In "normal" circumstances, #mpools is single-digit. However, the use case we're seeing that led to #ITN-2024-00101 is boosting this to tens or hundreds. At this scale, the number of iterations of this loop can get into the thousands, which can start to matter on a busy hive.

To mitigate the cost of this func, we can try a couple of things:

Make the func more efficient algorithmically (HIVE-2538)
Minimize the number of objects being processed (this card).

See thread for background. Compare and contrast with ~~HIVE-2540~~, which aims to reduce the total amount of network traffic via caching; whereas this card aims to reduce the number of objects retrieved for use by syncMachineSets.

The solution would be as simple as adding a filter here that matches our machine-pool name label to the MachinePool being processed. This should work because we subsequently match to generated MachineSets based on that label. However, we'll also have to figure out a way to discover the network in GCP, which we currently do by introspecting a random remote MachineSet. We know this will always work because the spoke will always have at least one MachineSet for workers. I wonder if, instead, we can use the master machine like we do in AWS to default the AMI.

is related to

HIVE-2540 Cache caching remote clients (for machinepool controller)

Closed

relates to

HIVE-2538 Rewrite the machinepool controller for CPU

To Do

Assignee:: Unassigned

Reporter:: Eric Fried

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/06/14 9:08 PM

Updated:: 2024/06/20 9:30 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide