Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2251

[GA] Agent-Installer Installation UI for OpenShift Virtualization

XMLWordPrintable

    • Product / Portfolio Work
    • VIRTSTRAT-60Installer to provide a fully functional virtualization cluster
    • 100% To Do, 0% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The installation process for the OpenShift Virtualization Engine (OVE) has been identified as a critical area for improvement to address customer concerns regarding its complexity compared to competitors like VMware, Nutanix, and Proxmox. Customers often struggle with disconnected environments, operator configuration, and managing external dependencies, making the initial deployment challenging and time-consuming. 

      To resolve these issues, the goal is to deliver a streamlined, opinionated installation workflow that leverages existing tools like the Agent-Based Installer, the Assisted Installer, and the OpenShift Appliance (all sharing the same underlying technology) while pre-configuring essential operators and minimizing dependencies, especially the need for an image registry before installation.

      Additionally, the offered functionality has to cover Day-2 (an upgrade and adding new nodes) in a disconnected and (external) registryless environment.

      By focusing on enterprise customers, particularly VMware administrators working in isolated networks, this effort aims to provide a user-friendly, UI-based installation experience that simplifies cluster setup and ensures quick time-to-value.

      Objectives and Goals

      Primary Objectives

      • Simplify the OpenShift Virtualization installation and Day-2 (upgrade and adding new nodes) processes to reduce complexity for enterprise customers coming from VMware vSphere.
      • Enable installation, upgrade, and adding new nodes in disconnected environments with minimal prerequisites.
      • Eliminate the dependency on a pre-existing image registry in disconnected installations, upgrades and adding new nodes.
      • Provide a user-friendly, UI-driven installation/upgrade experience for users used to VMware vSphere.

      Goals

      • Deliver an installation experience leveraging existing tools like the Agent-Based Installer, Assisted Installer, and OpenShift Appliance, i.e. the Assisted Service.
      • Pre-configure essential operators for OVE and minimize external day 1 dependencies (see OCPSTRAT-1811 "Agent Installer interface to install Operators") 
      • Ensure successful installation/upgrade/adding new nodes in disconnected environments with standalone OpenShift, with minimal requirements and no pre-existing registry

      Personas

      Primary Audience 

      VMware administrators transitioning to OpenShift Virtualization in isolated/disconnected environments.

      Pain Points

      • Lack of UI-driven workflows; writing YAML files is a barrier for the target user (virtualization platforms admins)
      • Complex setup requirements (e.g., image registries in disconnected environments).
      • Difficulty in configuring network settings interactively.
      • Lack of understanding when to use a specific installation method
      • Hard time finding the relevant installation method (docs or at console.redhat.com)

      Technical Requirements

      Use case:

      OpenShift installation/upgrade/adding new nodes with WebUI in a disconnected environment without a local image registry (all necessary components (RHCOS, OpenShift, and selected Day-2 operators) are a part of the mounted ISO image) for all Openshift deployments deployments:

      • Single Node OpenShift,
      • 3-Node with additional worker nodes
      • Standard OpenShift cluster with dedicated control plane nodes

      Additionally, support for x86_64 architecture (Aarch64, S390 and IBM-Z can be delivered as separate features after GA).

      Image Registry Simplification

      • Eliminate the dependency on an existing external image registry for disconnected environments.
      • Support a workflow similar to the OpenShift Appliance model, where users can deploy a cluster without external dependencies.

      Agent-Based Installer Enhancements

      • Extend the existing UI to capture all essential data points (e.g., cluster details, network settings, storage configuration) without requiring YAML files.
      • Install without a pre-existing registry in disconnected environment
      • Install required operators for virtualization (OpenShift Virtualization Reference Implementation Guide v1.0.2)
      • List of Operators:
        • OpenShift Virtualization Operator
        • Node Health Check (NHC) Operator
        • Fence Agents Remediation (FAR) Operator
        • Node Maintenance Operator (NMO)
        • Cluster Observability Operator(COO) and enable OpenShift Logging
        • MetalLB
        • Migration Toolkit for Virtualization (MTV)
        • Migration Toolkit for Containers (MTC)
        • Kube Descheduler Operator
        • NUMA Resources Operator
        • NMState Operator
        • Self Node Remediation (SNR) Operator 
        • OADP
      • Note: we need each operator owner to enable the operator to allow its installation via the installer. We won't block the release due to not having the full list of operators included and they'll be added as required and prioritized with each team.

      User experience requirements

      The first area of focus is a disconnected environment. We target these environments with the Agent-Based Installer

      The current docs for installing on disconnected environment are very long and hard to follow.

      Generala

      • Productization of OCPSTRAT-1985 TP features
        • Assisted UI
        • Appliance
      • Build Release pipeline for generic ISO
      • Build Release pipeline for “above-the-sea” UI
      • Support for OLM operator

      Connected phase

      • It will be a new option in the Assisted Installer wizard.
      • Selecting the new option, a user will receive a simple wizard leading to a customized ISO download.
      • ISO will be generated upfront for each Z-release with the latest version of Day-2 operators:
        • Minimum x86_64 architecture

      Disconnected phase

      • Installation:
        • Deploy: SNO, 3-Node with additional workers, Standard Cluster with dedicated control plane nodes
          • Including 4th and 5th control plane node
        • Possibility to select operators from the predefined list during the installation process
      • Upgrade/update:
        • This may require an intermediate step to update the cluster from 4.Y.Z to 4.Y.Z’, where Z’ is a recommended release for upgrading from Y to Y+1 release (e.g. 4.20.5 -> 4.20.25 -> 4.21.3)
        • Delivering this functionality in a disconnected and registerless environment may require more than one OpenShift release cycle.
        • Update from one Z-release to another under the same Y-release (e.g. 4.20.5 -> 4.20.10)
        • Upgrade from Y-release to Y+1-release
        • Upgrade/Update without an external registry
          • Infrastructure
          • OLM operators
      • Expand the cluster with new nodes (scale out and node failure scenarios)
        • Scale out scenarios worker nodes for below deployments:
          • 3-Node with 0..n workers
          • Standard OpenShift cluster with dedicated control plane nodes
        • Failure scenarios:
          • Worker node failure
          • Control Plane node failure (minimum - document the manual procedure)
        • Adding new nodes without an external registry
      • Security:
        • Authorisation and authentication of Assisted Service on a rendezvous node
        • Admin - AI-UI session protection (HTTPS, credentials to access UI) during a cluster installation

      Definition of done

      • Functionality described above delivered, tested by QE, and documented.

      Open question

      • Installation monitoring - Once the rendezvous node is rebooted., it may require several minutes between the rendezvous reboot and when the console is effectively available, and during this “blackout” period, the user will not have any way to detect/spot an eventual issue
      • Disaster Recovery

      Out of scope

      • Shared storage solution which should be provided by a customer

      Links:

              mzasepa Michal Zasepa
              linnguye.openshift Linh Nguyen
              None
              None
              None
              None
              Stephanie Stout Stephanie Stout
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: