Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-4873

Improvements on MNO in-service OCP Update & Upgrade processes, including Operators, for Telco Environments.

XMLWordPrintable

    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request

      Improvements on OCP Update processes, including Red Hat provided Day2 Operators, for Telco Environments.

       

      2. What is the nature and description of the request?

      Terms used:

      • Updates: 
        • z-stream update (e.g. 4.12.19 to 4.12.21) 
        • minor version or y-version update (e.g. 4.12.21 to 4.13.1) 
        • EUS-to-EUS update (e.g. 4.10.51 to 4.12.21)
      • MNO: Any topology that has redundancy on the control plane level, regardless of having dedicated nodes for that role or not (aka schedulable masters)
      • Maintenance window (MW): Typically, MW duration is 4 hours. Any update process must take place within 2 hours in the MW. The rest 2 hours will be used for any other purposes, such as troubleshooting and restoring system in case of a failure.
      • Outage time: the continuous time frame since CNF cannot serve traffic until the point CNF is starting, but not in a ready state yet. If multiple outages occur during an update procedure, outage time should be considered from the first occurance until the end of the last. 

      This RFE is requesting a list of improvements in all types of update processes for all valid topologies (SNO, SNO+1, MNO) used by Telco Partners in baremetal, disconnected environments. Partner expects that all requirements will be delivered by OCP 4.16, unless otherwise stated. 

      Deployments:

      • MNO: A three-node cluster with schedulable or non-schedulable Control Plane (master) nodes and any number of additional Compute (worker) nodes, from 0 to 15.

      The following list summarizes the improvements that should be introduced:

      1. MNO
        1. Non-rolling update is covered by RFE-4872
        2. Canary rollout update (in-service update)
          • A cluster could have any number of worker pools, from 2 to 15. 
          • Control plane nodes must complete the update process within 60 minutes in total. 
          • Every workerpool must complete the update process within 20 minutes. Following the update of each workerpool, the cluster must remain in a stable operational state for a longer period of time. This is required in case the update of all workerpools must be splitted into several Maintenance Windows, which are several days apart.
          • The aforementioned duration should include the update of Red Hat provided Day2 Operators. 
          • No outage should be observed during the update process, however service degradation should be expected, depending on the number of worker pools. 
          • The upgrade process should allow the user to provide additional configuration (e.g. MachineConfig) that will be applied during the upgrade process, without causing more than one single reboot per node for the entire process. 
          • In case of an unsuccessful update, the cluster must be in a stable operational state without capacity or service degradation at the end of the Maintenance Window. 
          • A list with checkpoints to ensure a cluster remains in a stable operational state without capacity or service degradation during the overall update process.

      3. Why does the customer need this? (List the business requirements here)

      Communication Regulators and Communication Service Providers have very strict requirements in terms of availability (outage), execution time of any maintenance, and recovery plan for any unforeseen issues. Moreover, the requirement is to eliminate or minimize the outage faced on the service.

       

      4. List any affected packages or components.

      RHCOS

      OpenShift

      Operators used by Telco Partners

      ACM

            rh-ee-smodeel Subin M
            phuet1@redhat.com Philippe Huet
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: