Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-1651

Increase runningCount for claims in excess of pool size

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • None
    • None
    • None
    • None

      Since HIVE-1576, you can specify ClusterPool.Spec.RunningCount to keep some number of unclaimed pool clusters active.

      If your pool is in steady state and you create <RunningCount claims, all those claims will get running CDs. If you create RunningCount < #claims <= Size, RunningCount of them will be running when claimed; and the remaining #claims - RunningCount will be kicked from Hibernating to Running. If you create >Size claims, Size of them will behave as above, and #claims new CDs will be provisioned so that the remaining #claims - Size of them can be assigned once they finish installing.

      In this last scenario, if RunningCount >= #claims - Size, you got lucky: your excess claims will get clusters that were provisioned with PowerState=Running. But if RunningCount < #claims - Size, then #claims - Size - RunningCount of them will get provisioned with PowerState=Hibernating. As soon as they're finished installing, they'll get assigned, whereupon they're kicked to Running – but there's going to be some interval during which Hibernating could have taken effect [1][2].

      The point of this card is to mitigate that last scenario such that those excess CDs are provisioned with PowerState=Running.

      Except that's not quite right. Because we want to enforce FIFO behavior, we always want the oldest CDs to be Running, and the oldest CDs to be claimed. I think that's not necessarily always (all of) the clusters we're provisioning to handle the excess.

      So I think the answer here is that we should calculate and use an "effective RunningCount", which is the original RunningCount plus some number calculated based on claims in excess of the pool size (zero when there are no excess claims). My brain doesn't know exactly what that calculation should be right now, but that's the idea.

      [1] Note that this scenario already existed prior to RunningCount: it applied any time a claim was waiting for the pool to finish installing a CD.
      [2] Hibernation is mostly a black box to hive. We kick around machine (de)activations, but don't really know/control what happens at the cloud provider. So for example, we don't know whether an in-progress hibernation is reversible, or whether it needs to finish hibernating before it turns around and starts back up. And that answer might be different for different cloud providers: What I'm getting at is: we don't really know the cost of this edge case.

              efried.openshift Eric Fried
              efried.openshift Eric Fried
              None
              None
              Lin Wang Lin Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: