Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-4208

Improvements on OCP SNO+1 Update & Upgrade processes, including Operators, for Telco Environments.

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%

      1. Proposed title of this feature request

      Improvements on OCP SNO+1 Update processes, including Red Hat provided Day2 Operators, for Telco Environments.

       

      2. What is the nature and description of the request?

      Terms used:

      • Updates: 
        • z-stream update (e.g. 4.12.19 to 4.12.21) 
        • minor version or y-version update (e.g. 4.12.21 to 4.13.1) 
        • EUS-to-EUS update (e.g. 4.10.51 to 4.12.21)
      • Maintenance window (MW): Typically, MW duration is 4 hours. Any update process must take place within 2 hours in the MW. The rest 2 hours will be used for any other purposes, such as troubleshooting and restoring system in case of a failure.
      • Outage time: the continuous time frame since CNF cannot serve traffic until the point CNF is starting, but not in a ready state yet. If multiple outages occur during an update procedure, outage time should be considered from the first occurance until the end of the last. 

      This RFE is requesting a list of improvements in all types of update processes for all valid topologies (SNO, SNO+1, MNO) used by Telco Partners in baremetal, disconnected environments. Partner expects that all requirements will be delivered by OCP 4.16, unless otherwise stated. 

      Deployments:

      • SNO: a single node cluster 
      • SNO+1: a single node cluster (SNO) with and an additional Compute (worker) node

      The following list summarizes the improvements that should be introduced:

      1. SNO
        • clusters update process is covered by TELCOSTRAT-160. The expected time durations will be updated directly in the corresponding feature.
      2. SNO+1
        • The duration of any z-stream and y-version update should be no more than 30 minutes, with a maximum total time of outage of 15 minutes.
        • The duration of any EUS-to-EUS update should be no more than 60 minutes, with a maximum total time of outage of 30 minutes.
        • The aforementioned duration and outage time should include the update of Red Hat provided Day2 Operators.
        • In the case of SNO+1, it is acceptable that both SNO and the dedicated compute (worker) node can be restarted in parallel.
        • The update process should allow the user to provide additional configuration (e.g. MachineConfig) that will be applied during the update process, without causing more than one single reboot per node for the entire process.
        • In case of an unsuccessful update, the cluster must be in a stable operational state without capacity or service degradation at the end of the Maintenance Window.

      3. Why does the customer need this? (List the business requirements here)

      Communication Regulators and Communication Service Providers have very strict requirements in terms of availability (outage), execution time of any maintenance, and recovery plan for any unforeseen issues. Moreover, the requirement is to eliminate or minimize the outage faced on the service.

       

      4. List any affected packages or components.

      RHCOS

      OpenShift

      Operators used by Telco Partners

      ACM

            phuet1@redhat.com Philippe Huet
            dvassili@redhat.com Demetris Vassiliades
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: