Uploaded image for project: 'OpenShift Top Level Product Strategy'
  1. OpenShift Top Level Product Strategy
  2. OCPPLAN-7811

Improve MCO Upgrades and Maintainability

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Done
    • Icon: Major Major
    • openshift-4.10
    • None
    • None
    • No
    • 0
    • 0% 0%

      [Sept 3. Note: this might need to be broken into 2 issues]

      Feature Overview and Background

      The MCO team has reported several classes of issues with the MCO that can cause support cases or block upgrades. Additionally, there are changes that the team believes will improve the maintainability of the code and make it easier to troubleshoot.

      4.7 Phase - Wait for All Worker Pools on Upgrade

      • Today when the an upgrade is initiated, the CVO will report that the upgrade is complete after the master pool has been upgraded.
      • Other pools may not have completed due to an upgrade problem or a perfectly valid condition like pausing reconciliation on one or more pools.
      • In either an unintentional (there is an error preventing upgrade of a worker) or intentional situation (a pool is paused), the administrator can initiate another upgrade before the previous one has been rolled out to the full cluster.
      • Why this is important
        • Cluster administrators can get themselves into a state where the cluster itself states that it is upgraded when, in fact, it isn't fully. The end result is somewhere between releases especially on the compute side. We want to avoid a minor version skew between control plane and compute nodes (z stream skews are acceptable for k8s instead). This will lower the number of bug report that the team gets because the admin started an upgrade which degraded the compute pool w/o noticing and moved on to another upgrade leaving compute at 4.(y-2).

      Future work

      Fault Tolerant MCD - https://issues.redhat.com/browse/GRPA-2682

      Best Effort Upgrade on Degraded MCO: https://issues.redhat.com/browse/GRPA-1641

      Rework Kubeletconfig and Containerruntimeconfig Controllers - https://issues.redhat.com/browse/GRPA-2679

      Validate pullsecret before writing it: https://issues.redhat.com/browse/GRPA-2699

      Also related: Bootimage Updates: https://issues.redhat.com/browse/GRPA-2680

       

            rhn-support-mrussell Mark Russell
            rhn-support-mrussell Mark Russell
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: