Uploaded image for project: 'Agent-based Installer for OpenShift'
  1. Agent-based Installer for OpenShift
  2. AGENT-989

Allow 5-node control planes in day 1 with Agent-Based Installer

XMLWordPrintable

    • Allow 5-node control planes in day 1 with Agent-Based Installer
    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • To Do
    • OCPSTRAT-1219 - Allow 5-node control planes in day 1 with Agent-Based Installer
    • OCPSTRAT-1219Allow 5-node control planes in day 1 with Agent-Based Installer
    • 0% To Do, 33% In Progress, 67% Done
    • Hide

      11/21/24 Green

      The epic is dev-complete. Testing and adding a periodic CI job is the remaining task.

      11/13/24 Green

      assisted-service PR to allow 4/5 control plane replicas is still under review. 5 control plane replica CI jobs added for assisted repos. Other CI jobs for installer and dev-scripts and CI jobs for 4 control plane replicas are on hold until assisted-service PR merges.

      11/08/24 Green

      5 control plane replicas are now working with latest assisted-service PR update. Installer/ABI change to remove validations blocking 4/5 control plane replicas has merged. CI jobs to test this configuration is under review.

      10/31/24 - Orange/Red

      Unable to deploy 5 control plane replica cluster using latest assisted-service PR. ABI does not define roles for hosts and there was misunderstanding here. Waiting for next assisted-service PR update. Continue to work on contingency plan. Posted PRs to enable 5&4 control plane replicas for ABI and their support in dev-scripts.

      10/23/24 - Orange/Red

      Tested assisted-installer WIP patch with ABI. Deploying a 5 control plane node cluster results in 3 masters and 2 workers. This is happening because currently assisted-service doesn't have an internal representation of 5 control plane nodes, just HighAvailabiltyMode which means 3 control plane nodes. Because the assisted-installer patch is still WIP, we will need to see if proper support is added. The assisted-installer team is aware of this issue. The agent-team is also working on a contingency plan where we have the agent-based-installer client fix the roles prior to starting the cluster installation.

      10/16/24 - Green

      assisted-installer team has WIP patch. Will begin working on removing validations from ABI this week.

      10/9/24 - Orange

      No change from last week. Changing status color to orange because we are still waiting for assisted-installer to complete their change. Will begin removing validations from ABI next sprint.

      10/2/24 - Green

      Completed task breakdown and estimation. Waiting for assisted-installer to remove their validations blocking 4/5 control plane nodes to test ABI.

      Show
      11/21/24 Green The epic is dev-complete. Testing and adding a periodic CI job is the remaining task. 11/13/24 Green assisted-service PR to allow 4/5 control plane replicas is still under review. 5 control plane replica CI jobs added for assisted repos. Other CI jobs for installer and dev-scripts and CI jobs for 4 control plane replicas are on hold until assisted-service PR merges. 11/08/24 Green 5 control plane replicas are now working with latest assisted-service PR update. Installer/ABI change to remove validations blocking 4/5 control plane replicas has merged. CI jobs to test this configuration is under review. 10/31/24 - Orange / Red Unable to deploy 5 control plane replica cluster using latest assisted-service PR. ABI does not define roles for hosts and there was misunderstanding here. Waiting for next assisted-service PR update. Continue to work on contingency plan. Posted PRs to enable 5&4 control plane replicas for ABI and their support in dev-scripts. 10/23/24 - Orange / Red Tested assisted-installer WIP patch with ABI. Deploying a 5 control plane node cluster results in 3 masters and 2 workers. This is happening because currently assisted-service doesn't have an internal representation of 5 control plane nodes, just HighAvailabiltyMode which means 3 control plane nodes. Because the assisted-installer patch is still WIP, we will need to see if proper support is added. The assisted-installer team is aware of this issue. The agent-team is also working on a contingency plan where we have the agent-based-installer client fix the roles prior to starting the cluster installation. 10/16/24 - Green assisted-installer team has WIP patch. Will begin working on removing validations from ABI this week. 10/9/24 - Orange No change from last week. Changing status color to orange because we are still waiting for assisted-installer to complete their change. Will begin removing validations from ABI next sprint. 10/2/24 - Green Completed task breakdown and estimation. Waiting for assisted-installer to remove their validations blocking 4/5 control plane nodes to test ABI.

      Epic Goal

      • Enable users to install a 4/5-node control plane on day 1.

      Why is this important?

      • Users would like more resilient clusters that are deployed across two sites in 3+2 or 2+2 deployments

      Scenarios

      1. In a 2 + 1 compact cluster deployment across two sites, the failure of the site with 2 control plane nodes would leave a single control plane node to be used for recovery. Having an extra node on the surviving site would give better odds that the cluster can be recovered in the event another node fails.

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • Users can set control plane replicas to 4 or 5.
      • Any number of worker nodes can be added to a control plane with 4 or 5 replicas.

      Dependencies (internal and external)

      1. https://issues.redhat.com/browse/MGMT-18588
      2. https://issues.redhat.com/browse/OCPSTRAT-1199

      Previous Work (Optional):

      1. N/A

      Open questions::

      1. Should this be made available only for the baremetal platform? 

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              rwsu1@redhat.com Richard Su
              rwsu1@redhat.com Richard Su
              Biagio Manzari Biagio Manzari
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: