Loading...

XML

Word

Printable

Type: Feature
Resolution: Done-Errata
Priority: Critical
Fix Version/s: openshift-4.20
Affects Version/s: None
Component/s: Core, Two Node Fencing
Labels:

Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-1542Two Node OpenShift topologies for edge customers
Hierarchy Progress Bar:

7% To Do, 0% In Progress, 93% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Size:
XL

Target Version:

openshift-4.20
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:
PX Technical Impact:
PX Impact Range:
PX Scheduling Request:
None
PX Technical Impact Notes:
None

Intelligence Requested:
Market:

Feature Overview (aka. Goal Summary)

Customers with large numbers of geographically dispersed locations want a container management solution with a two node footprint. They require high availability but even "cheap" third nodes represent a significant cost at this scale.

Goals (aka. expected user outcomes)

Two-node clustering is a solved problem in the traditional HA space. The goal of this feature is to introduce existing RHEL technologies into OpenShift to support a true two-node topology. This requires fencing to ensure node recovery. Hence the name: Two Node OpenShift with Fencing (TNF).

Requirements (aka. Acceptance Criteria):

Provide a true two node OCP deployment
Support workload in active/passive mode..i.e.. single instance pod where the pods from the failed node are restarted on the 2nd node in a timely manner, or a 2nd pod is already running but passive, ready to take over if the 1st pod fails (e.g.: psql database in an active/passive setup). This sees CPU utilisation ~50% max.
Support workload in active/active workload. Both nodes are load sharing and they are loaded by design to be about 60-75% at full capacity - during failure there is an expectation of service degradation but not service down completely - So if one node fails the other node operates at close to 100%
Either both nodes have a fencing device (BMC via redfish, IPMI etc, UPS via serial port), ~~or there is a dedicated direct cross over cable between the nodes to drastically reduce the risk of split brain.~~ BMC via redfish at TP only, other fencing devices probably post-GA.
<60s failover time: if the leading node goes down, the remaining nodes takes over and gains operational state (writable) in less then 60s. Exact parameters (heartbeat interval, missed heartbeats etc. needs to be configurable by users, e.g. to operate on a less aggressive timeline if required (avoid unnecessary failovers due to blip/flukes). (To be refined after initial numbers observed during TP testing)
No shared storage available between nodes required as fencing device.
Be able to scale out to a true three node compact cluster as day2 operation. (Stretch goal, not required for MVP, but constraint to be kept in mind during design and implementation). The resulting cluster should have 3 node etcd quorum, and the same architecture/support statement as a freshly installed 3 node compact cluster. Out of scope for TP, and probably even for GA, as OCP currently does not control plane topology changes.
~~Be able to add worker nodes to a two node cluster with fencing as day2 operation. Like we do support with SNO+worker nodes~~ (stretch goal, no required for TP or GA)
Solution fullfills the[ k8s-etcd contract|https://docs.google.com/document/d/1NUZDiJeiIH5vo_FMaTWf0JtrQKCx0kpEaIIuPoj9P6A/edit#heading=h.tlkin1a8b8bl], so that layer mechanism like Leases work correctly.
support full recovery of the workload when the node comes back online after restoration - total time <15 mins
X86_64 only on initial release, AARCH64 might be added later.
Added: ability to track TNF usage in the fleet of connected clusters via OCP telemetry data (e.g. number of clusters with TNF topology)
~~Added: Be able to install OCP Virt and run VMs with node local storage (e.g. LSO or LVMS) on both nodes.~~ Deferred to GA

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Self managed
Classic (standalone cluster)	yes
Hosted control planes	n/a
Multi node, Compact (three node), or Single node (SNO), or all	NEW: Two Node with Fencing
Connected / Restricted Network	both
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	X86, arm
Operator compatibility	full
Backport needed (list applicable versions)	no
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	none
Other (please specify)