Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-684

Canary Rollouts to speed up upgrade times

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • 3
    • False
    • None
    • False

      Goal

          • Rule of thumb for how many nodes in a pool = % [Spare capacity percent on overall total capacity]* Number of nodes in the cluster;
          • Needs to be adjusted downwards. 
        • Set maxUnavailable=Number of workers in the node pool 
        • Pause all Node Pools 
        • Unpause the target node pool. 
        • Repeat until all node pools are upgraded. 

      Why is this important?

      Significant improvements in upgrade times are possible. Also number of pod restarts are reduced. 

      For example, a 10 node pool on 100 nodes, means that 10 nodes will only take 10 minutes each and therefore overall time now is reduced to 100 minutes. Significant improvements. 

      Scenarios

      1. ...

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. https://docs.openshift.com/container-platform/4.10/updating/update-using-custom-machine-config-pools.html
      2.  

      Open questions::

      1. How does PDBs impact this?
      2. Can the number of pod restarts be improved, also because of the new scheduler functionality that choses pod evictions to land on previously upgraded worker pools? Should we mention that in documentation?

            lmohanty@redhat.com Lalatendu Mohanty
            tkatarki@redhat.com Tushar Katarki
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: