Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1709

User triggered delayed node rollouts in HyperShift upgrades

XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 100% To Do, 0% In Progress, 0% Done
    • 6
    • 0

      Feature Overview (aka. Goal Summary)  

      Rolling out new versions of HyperShift Operator or Hosted Control Plane components such as HyperShift's Control Plane Operator will no longer carry the possibility of triggering a Node rollout that can affect customer workloads running on those nodes

      Goals (aka. expected user outcomes)

      Customer Nodepool rollouts exhaustive cause list will be:

      • Due to customer direct scaling up/down of the Nodepool
      • Due to customer change of Hosted Cluster or Nodepool configuration that is documented to incur in a rollout

      Customers will have visibility on rollouts that are pending so that they can effect a rollout of their affected nodepools at their earliest convenience

      Requirements (aka. Acceptance Criteria):

      • Observability:
        • It must be possible to account for all Nodepools with pending rollouts
        • It must be possible to identify all the Hosted Clusters with Nodepools with pending rollouts
        • It must be possible for a customer to see that a Nodepool has pending rollouts
      • Kubernetes expectations on resource reconciliation must be upheld
      • Queued rollouts must survive HyperShift restarts

       

      Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

      Deployment considerations List applicable specific needs (N/A = not applicable)
      Self-managed, managed, or both Managed (ROSA and ARO)
      Classic (standalone cluster) No
      Hosted control planes Yes
      Multi node, Compact (three node), or Single node (SNO), or all All supported Managed Hosted Control Plane topologies and configurations
      Connected / Restricted Network All supported Managed Hosted Control Plane topologies and configurations
      Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) All supported Managed Hosted Control Plane topologies and configurations
      Operator compatibility N/A
      Backport needed (list applicable versions) All supported Managed Hosted Control Plane releases
      UI need (e.g. OpenShift Console, dynamic plugin, OCM) Yes. Console represtation of Nodepools/Machinepools should indicate pending rollouts and allow them to be triggered
      Other (please specify)  

      Use Cases (Optional):

      • Managed OpenShift fix requires a Node rollout to be applied

      Questions to Answer (Optional):

      • rh-ee-brcox
        • Should we explicitly call out the OCP versions in Backport needed (list applicable versions)?
          • asegurap1@redhat.com: Depends on what the supported OCP versions are in Managed OpenShift by the time this feature is delivered
        • Is there a missing Goal of this feature, to force a rollout at a particular date? My line of thinking is what about CVE issues on the NodePool OCP version - are we giving them a warning like "hey you have a pending rollout because of a CVE; if you don't update the nodes yourself, we will on such & such date"?
      • jparrill@redhat.com 
        • What’s the expectations on a regular customer nodePool upgrade? The change will be directly applied or queued following the last requested change?
        • This only applies to NodePool changes or also would affect CP upgrades (thinking of OVN changes that could also affect the data plane)?
          • asegurap1@redhat.com: CP upgrades that would trigger Nodepool rollouts are in scope. OVN changes should only apply if CNO or its daemonsets are going to cause reboots
        • How the customer will trigger the pending rollouts? An alert will trigger in the hosted cluster console?
          • asegurap1@redhat.com: I guess there are multiple options like scaling down and up and also adding some API to Nodepool
        • I assume we will use a new status condition to reflect the current queue of pending rollouts, it’s that the case?.
          • asegurap1@redhat.com: That's a good guess. Hopefully we can represent all we want with it or we constrain ourselves to what it can express
        • With "Queued rollouts must survive HyperShift restarts"... What kind of details we wanna store there (“there” should be the place to persist the changes queued), the order, the number of rollouts, the destination Hashes, more info…?
          • asegurap1@redhat.com: I'll leave that as an open question to refine
            I'll If there are more than one change pending, we asume there will be more than one reboot?

      Out of Scope

      • Maintenance windows
      • Queuing of rollouts on user actions (as that does not meet the Kubernetes reconciliation expectations and is better addressed at either the Cluster Service API level or better yet, at the customer automation side).
      • Forced rollouts of pending updates on a certain date. That is something that should be handled at the Cluster Service level if there is desire to provide it.

      Background

      Past incidents with fixes to ignition generation resulting in rollout unexpected by the customer with workload impact

      Customer Considerations

      There should be an easy way to see, understand the content and trigger queued updates

      Documentation Considerations

      SOPs for the observability above

      ROSA documentation for queued updates

      Interoperability Considerations

      ROSA/HCP and ARO/HCP

              Unassigned Unassigned
              asegurap1@redhat.com Antoni Segura Puimedon
              He Liu He Liu
              Aaren de Jong Aaren de Jong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: