XML

Word

Printable

Type: Epic
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:

Epic Name:
Verizon
Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-218Hive in split control plane for etcd scaling
Hierarchy Progress Bar:

50% To Do, 0% In Progress, 50% Done
Blocked:
False
Blocked Reason:

Hide

This epic was automatically marked as blocked because the resolution for a subtask has been set to Won't Do (or Won't Fix), indicating a functional team cannot support this epic. If you believe this occurred in error, please reach out to the functional team for help in getting this work into their queue.

Show
This epic was automatically marked as blocked because the resolution for a subtask has been set to Won't Do (or Won't Fix), indicating a functional team cannot support this epic. If you believe this occurred in error, please reach out to the functional team for help in getting this work into their queue.
Ready:
False
Color Status:
Green
Size:
XL

Target Version:

openshift-4.13
Release Blocker:
Approved

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

Epic Goal

Make hive understand two different "control plane API endpoints": one for Jobs, Pods, anything that needs to run on a worker node; another for non-"runnable" objects/CRs like ClusterDeployment, MachinePool, Secret, etc.
Some background: https://coreos.slack.com/archives/CE3ETN3J8/p1651246764652499

Why is this important?

ACM would like to run hive in a "split control plane" for etcd scaling purposes. See "Scenarios".

Scenarios

Customer wants to create/manage O(100k) spoke clusters from a single OCP cluster. Due to the number of objects each requires, etcd on one ACM+hive can handle O(1k). ACM proposes an architecture where:

ACM/MCE/Hive stacks are running in O(100) separate namespaces on a single OCP cluster. We'll call these "control planes".
Each control plane talks to a corresponding "data plane", a virtualized kube API server running in the same namespace, importantly with its own independent etcd.
A single "hub of hubs" ACM front end aggregates everything for management and observability. (Does this front end run in the big OCP cluster or elsewhere?) (Hive doesn't know/care about this layer.)

In this model, all the "runnables" (containers in pods from deployments/statefulsets including controllers; containers in pods created from jobs for e.g. provision/deprovision; maybe other things??) need to be created/reconciled in the control planes; but all the other objects – hive CRs, the Secrets hive consumes, etc. – need to be created/reconciled in the data plane corresponding to that control plane.

This means we need to teach hive to understand two API servers and manage the right objects in the right one.
From a hive API perspective, this is probably just one field in HiveConfig. Internally, we use different clients for the control plane and data plane objects.
The hard part is going to be figuring out which code paths need to use which clients, and validating that we didn't miss any

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

depends on

HIVE-1862 [Verizon] split control plane investigation and scoping

To Do

is related to

ACM-1060 PROTOTYPE: ACM with Hypershift Kubevirt or Zero NodePools

Closed

links to

openshift/enhancements#1284: hypershift: propose pluggable konnectivity

openshift/hive#1848: Design doc for "Scale Mode"

openshift/hive#1854: ScaleMode

openshift/hive#1872: DNM: Move secret retrieval to hiveutil for AWS prov/deprov

openshift/hive#1874: Move secret/configmap retrieval to hiveutil for prov/deprov

openshift/hive#2582: HIVE-2781: nix terminateWhenFilesChange()

(3 links to)

There are no Sub-Tasks for this issue.

Assignee:: Eric Fried

Reporter:: Mike Worthington

Need Info From:: None

Contributors:: Eric Fried, Joshua Packer

QA Contact:: Mingxia Huang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2022/04/29 7:07 PM

Updated:: 2025/07/18 1:36 PM

Resolved:: 2023/12/20 9:19 PM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates