Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2539

Rework the machinepool controller for network

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      The syncMachineSets in the MachinePool controller is going to be somewhat inefficient at scale, as it will iterate #remoteMS * #generatedMS times. #generatedMS should always be fairly small given we're reconciling a single MachinePool – usually at most equal to the number of AZs (aka failure domains) on the spoke region. However, as currently written, #remoteMS is all the MachineSets on the spoke, which is generally O(#mpools * #msPerPool). In "normal" circumstances, #mpools is single-digit. However, the use case we're seeing that led to #ITN-2024-00101 is boosting this to tens or hundreds. At this scale, the number of iterations of this loop can get into the thousands, which can start to matter on a busy hive.

      To mitigate the cost of this func, we can try a couple of things:

      • Make the func more efficient algorithmically (HIVE-2538)
      • Minimize the number of objects being processed (this card).

      See thread for background. Compare and contrast with HIVE-2540, which aims to reduce the total amount of network traffic via caching; whereas this card aims to reduce the number of objects retrieved for use by syncMachineSets.

      The solution would be as simple as adding a filter here that matches our machine-pool name label to the MachinePool being processed. This should work because we subsequently match to generated MachineSets based on that label. However, we'll also have to figure out a way to discover the network in GCP, which we currently do by introspecting a random remote MachineSet. We know this will always work because the spoke will always have at least one MachineSet for workers. I wonder if, instead, we can use the master machine like we do in AWS to default the AMI.

              Unassigned Unassigned
              efried.openshift Eric Fried
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: