Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2224

Lockstep hibernation with MachineConfigPool upgrades

    XMLWordPrintable

Details

    • Lockstep hibernation with MachineConfigPool upgrades
    • False
    • None
    • False
    • Not Selected
    • In Progress
    • OCPSTRAT-543 - Shutdown/Resume of managed OSD/ROSA clusters
    • OCPSTRAT-543Shutdown/Resume of managed OSD/ROSA clusters
    • 0
    • 0% 0%

    Description

      One of the criterion for hibernation in OSD/ROSA is

      Cluster shutdown must be blocked if the MachineConfigPools are in updating state.

      There is an inherent timing problem with simply effecting

      if isUpgrading(MCO) {
        return errors,New("Can't hibernate during MCO upgrade")
      }
      hibernate()
      

      as an upgrade could kick off between when we check and when we initiate the hibernation. Thus it would have to look more like:

      freezeMCOUpgrades()
      if isUpgrading(MCO) {
        unfreezeMCOUpgrades()
        return errors.New("Can't hibernate during MCO upgrade")
      }
      hibernate()
      
      // ...and then in the resume flow
      resume()
      unfreezeMCOUpgrades()
      

      This assumes a freezeMCOUpgrades() is possible. I'm told it is – but if you freeze in the middle of an upgrade, MCO will finish whatever machine it's on and leave the rest. So some additional coordination will be necessary to figure out how to freeze either before or after that whole process.

      I'm also told that in 4.13+, cert rotation is now done independently of the upgrade procedure. Assuming cert rotation is the motivation behind the original restriction ("no hibernation during MCO upgrades") this may make this issue moot for 4.13+... but add an extra criterion to the logic for <4.13.

      Attachments

        Issue Links

          Activity

            People

              leah_leshchinsky Leah Leshchinsky
              efried.openshift Eric Fried
              Ju Lim Ju Lim
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: