-
Feature
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
Strategic Product Work
-
False
-
-
False
-
OCPSTRAT-1542Two Node OpenShift topologies for edge customers
-
100% To Do, 0% In Progress, 0% Done
-
XL
-
0
Feature Overview (aka. Goal Summary)
Edge customers requiring computing on-site to serve business applications (e.g., point of sale, security & control applications, AI inference) are asking for a 2-node HA solution for their environments. Only two nodes at the edge, because the 3d node induces too much cost, but still they need HA for critical workload. To address this need, a 2+1 topology is introduced. It supports a small cheap arbiter node that can optionally be remote/virtual to reduce onsite HW cost.
Goals (aka. expected user outcomes)
Support OpenShift on 2+1 topology, meaning two primary nodes with large capacity to run workload and control plan, and a third small “arbiter” node which ensure quorum. See requirements for more details.
Requirements (aka. Acceptance Criteria):
- Co-located arbiter node - 3d node in same network/location with low latency network access, but the arbiter node is much smaller compared to the two main nodes. Target resource requirements for the arbiter node: 4 cores / 8 vcpu, 16G RAM, 120G disk (non-spinning), 1x1 GbE network ports, no BMC
- OCP Virt fully functionally, incl. Live migration of VMs (assuming RWX CSI Driver is available)
- Single Node outage is handled seamlessly
- In case the arbiter node is down , a reboot/restart of the two remaining nodes has to work, i.e. the two remaining nodes re-gain quorum and spin-up the workload.
- Scale out of the cluster by adding additional worker nodes should be possible
- Transition the cluster into a regular 3 node compact cluster, e.g. by adding a new node as control plane node, then removing the arbiter node, should be possible
- Regular workload should not be scheduled to the arbiter node (e.g by making it un-schedulabe, or introduce a new node role “arbiter”). Only essential control plane workload (etcd components) should run on the arbiter node. Non-essential control plan workload (i.e. router, registry, console, monitoring etc) should also not be scheduled to the arbiter nodded.
- It must be possible to explicitly schedule additional workload to the arbiter node. That is important for 3d party solutions (e.g. storage provider) which also have quorum based mechanisms.
- must seamlessly integrate into existing installation/update mechanisms, esp. zero touch provisioning etc.
- Added: ability to track OLA usage in the fleet of connected clusters via OCP telemetry data
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | self-managed |
Classic (standalone cluster) | yes |
Hosted control planes | no |
Multi node, Compact (three node), or Single node (SNO), or all | Multi node and Compact (three node) |
Connected / Restricted Network | both |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_86 and ARM |
Operator compatibility | full |
Backport needed (list applicable versions) | no |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | no |
Other (please specify) | n/a |
Questions to Answer (Optional):
- How to implement the scheduling restrictions to the arbiter node? New node role “arbiter”?
- Can this be delivered in one release, or do we need to split, e.g. TechPreview + GA?
Out of Scope
- Storage driver providing RWX shared storage
- …
Background
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
- Two node support is in high demand by telco, industrial and retail customers.
- VMWare supports a two node VSan solution: https://core.vmware.com/resource/vsan-2-node-cluster-guide
- Example edge hardware frequently used for edge deployments with a co-located small arbiter node: Dell PowerEdge XR4000z Server is an edge computing device that allows restaurants, retailers, and other small to medium businesses to set up local computing for data-intensive workloads.
Customer Considerations
See requirements - there are two main groups of customers: co-located arbiter node, and remote arbiter node.
Documentation Considerations
- Topology needs to be documented, esp. The requirements of the arbiter node.
Interoperability Considerations
- OCP Virt needs to be explicitly tested on this scenario to support VM HA (live migration, restart on other node)
- clones
-
OCPSTRAT-1500 Support 2+1 node Openshift cluster with Local Arbiter (OLA) - Tech Preview
- In Progress