Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-922

In Docs, motivate duration of control-plane component updates

    • Icon: Story Story
    • Resolution: Done
    • Icon: Minor Minor
    • None
    • None
    • None

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      <--- Cut-n-Paste the entire contents of this description into your new Epic --->

      Epic Goal

      OSDOCS-3300 added docs like:

      The Cluster Version Operator (CVO) retrieves the target update release image and applies to the cluster. All components which run as pods are updated during this phase, whereas the host components are updated by the Machine Config Operator (MCO). This process might take 60 to 120 minutes.

      That's a long time, and folks can wonder why it takes so long. OTA-474 is bringing in some documentation explaining runlevels and manifest name patterns, etc. That will establish "and update involves a combination of serial and parallel pivots". This ticket aims to follow up with an explanation of how we get from there to the ~hour timespan. But we want the declaration to not go stale quickly. We can probably pick out a few slow operators (Kube API server, networking, and DNS) and explain why those take some time to perform a zero-disruption update, without making brittle commitments about the order in which those changes roll out.

            [OTA-922] In Docs, motivate duration of control-plane component updates

            Here's the final designed image that will be added to the docs as part of OSDOCS-7104:

             

            Since the image is ready to use and the doc implementation is underway, I think this ticket can be closed as complete.

            Sebastian Kopacz added a comment - Here's the final designed image that will be added to the docs as part of OSDOCS-7104:   Since the image is ready to use and the doc implementation is underway, I think this ticket can be closed as complete.

            Sebastian Kopacz added a comment - - edited

            I've created an OSDOCS issue to track the documentation implementation: OSDOCS-7104

            Sebastian Kopacz added a comment - - edited I've created an OSDOCS issue to track the documentation implementation: OSDOCS-7104

            Lalatendu Mohanty added a comment - - edited

            We are going to convert this is in to a Jira card and attach to it a existing doc epic OTA-810

            Lalatendu Mohanty added a comment - - edited We are going to convert this is in to a Jira card and attach to it a existing doc epic OTA-810

            Changing the priority of this epic to minor because OTA-474 will deliver more than 90% of ask of this epic.

            Lalatendu Mohanty added a comment - Changing the priority of this epic to minor because OTA-474 will deliver more than 90% of ask of this epic.

            Re-opened so we can follow up on OTA-474 with some additional information, without scope-creeping OTA-474. Looking at CI:

            https://docs.ci.openshift.org/docs/getting-started/useful-links/ > OCP AMD 64 release status page > https://amd64.ocp.releases.ci.openshift.org/ > 4.12.6 > https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.12.6 > pick an update from 4.11.30 > https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-upgrade/1630960554711453696 > open an intervals chart, and check the Progressing=True section:

            From that, kube-apiserver took 9m, network took 9m (before the nodes started rolling), and machine-config took 24m. Adding those up already gets us a healthy chunk of an hour, and we can wave our hands around for "and there's other stuff too", and that may be sufficient.

            W. Trevor King added a comment - Re-opened so we can follow up on OTA-474 with some additional information, without scope-creeping OTA-474 . Looking at CI: https://docs.ci.openshift.org/docs/getting-started/useful-links/ > OCP AMD 64 release status page > https://amd64.ocp.releases.ci.openshift.org/ > 4.12.6 > https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.12.6 > pick an update from 4.11.30 > https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-upgrade/1630960554711453696 > open an intervals chart, and check the Progressing=True section: From that, kube-apiserver took 9m, network took 9m (before the nodes started rolling), and machine-config took 24m. Adding those up already gets us a healthy chunk of an hour, and we can wave our hands around for "and there's other stuff too", and that may be sufficient.

            Closing this as OTA-474 will cover the same.

            Lalatendu Mohanty added a comment - Closing this as OTA-474 will cover the same.

              rhn-support-skopacz Sebastian Kopacz
              rh-ee-smodeel Subin M
              Jia Liu Jia Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: