Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-4651

Allow compute MachineConfigPools to opt-out of blocking ClusterVersion updates

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • MCO
    • None
    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%

      1. Proposed title of this feature request

      Allow compute MachineConfigPools to opt-out of blocking ClusterVersion updates

      2. What is the nature and description of the request?

      MachineConfigPools have supported an machineconfiguration.openshift.io/required-for-upgrade annotation since 4.1. Pools with this annotation block the machine-config operator from declaring updates complete in its ClusterOperator, which in turn blocks the cluster-version operator from continuing past the MCO in OpenShift updates. The control-plane pool is required to set this annotation, and compute pools are allowed, but not required, to set it. Since 4.7's RFE-908 GRPA-2779, even pools that do not set the annotation block the MCO from declaring ClusterOperator update completion if they are Degraded=True. This RFE suggests adding a way to opt out of that functionality, so pools can avoid blocking ClusterOperator update completion, regardless of whether their Degraded status.

      3. Why does the customer need this? (List the business requirements here)

      Back in 4.7, there were fewer guards against compute node skew, which has upstream limits. But since 4.9, backported through 4.7.41, the Kubernetes API server operator has tracked kubelet version skew and set Upgradeable=False to block control-plane updates when they would introduce excessive skew vs. current nodes. Having compute pools continue to block ClusterOperator update completion and the rest of a control-plane/ClusterVersion update may still be useful for many clusters, but allowing customers to opt out allows them to quickly roll out control-plane improvements without draining (and possibly disrupting, for workloads which haven't yet learned how to gracefully drain during update) compute to roll out the updated MachineConfigs (including new RHCOS). That should reduce the uncertainty and fear of control-plane updates, which we expect to be very low-disruption for customer workloads, and the compute update, where the cluster admin has knobs like paused on the MachineConfigPool if they want to slow things down.

      4. List any affected packages or components.

      MCO. Docs. Possibly oc adm ..., web-console, etc. if folks wanted to bubble this up into friendlier UIs, although we can open with an annotation and see if there's interest before investing time in friendly UIs.

      The current machineconfiguration.openshift.io/required-for-upgrade is just checking for key-presence, so we probably don't want to use values like:

      machineconfiguration.openshift.io/required-for-upgrade: "false"
      

      although it seems unlikely that anyone is actually using "false" as the value today, so it's not entirely off the table. We could define a new annotation like:

      machineconfiguration.openshift.io/update-effect: required
      machineconfiguration.openshift.io/update-effect: optional
      

      or some such and move our internal use to the new annotation. Or we could define a new machineconfiguration.openshift.io/optional-for-upgrade key, if we want to roll forward with the "just use keys, without worrying about values" precedent. Although personally I prefer caring about the values, because it makes it harder to say impossible things like declaring required-for-update and optional-for-update simultaneously. If we go with a new, value-using annotation, we could (but would not have to) deprecate machineconfiguration.openshift.io/required-for-upgrade, and eventually alert-on anyone using that in-cluster and/or go Upgradeable=False to encourage folks to migrate.

            rhn-support-mrussell Mark Russell
            trking W. Trevor King
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: