-
Feature
-
Resolution: Unresolved
-
Critical
-
None
-
False
-
None
-
False
-
50% To Do, 0% In Progress, 50% Done
Epic Goal
- Document the capacity planning characteristics for a Hub management control plane.
- We think this could be an independent White Paper style of document with an accompanying spreadsheet that can be used for calculations
- "a simple spreadsheet that they plug some numbers in and they get their answer, e.g. X hubs to support this with Y cpu and Z memory."
- As a user, I want to understand the impact of adding custom metrics on the growth of metrics data and system resources
Why is this important?
- Customers need to know what to expect from the Hub management.
- Support teams, field engineers, consultants, need well known facts when discussing ACM capacity with customers.
Scenarios
note: the word "handle" means:
- ACM console does not exhibit a user experience degradation; 2sec or less load time is acceptable on all dashboard pages, 90% of the time.
- all aspects of ACM remain functional (search, obs, grafana, alertmanager, policy create/enforce, cluster create/import, application create/deploy, ansible integration, submariner, gatekeeper etc)
- search collector configuration can be leveraged to reduce the amount of search collection taking place - make a note of what changes were needed to achieve success
- obs data collection can be tuned to appropriate level such that datas are still collected and displayed in Grafana - make a note of what changes were needed to achieve success
- all APIs remain operational and UP
- all control plane aspects of the underlying OCP of the hub are stable
1. How many clusters can a single ACM hub handle?
- single node clusters
- How many 100 node clusters can a single ACM hub handle?
- How many 250 node clusters can a single ACM hub handle?
- How many 500 node clusters can a single ACM hub handle?
- note OpenShift documents 500 node as recommended maximum: https://docs.openshift.com/container-platform/4.10/scalability_and_performance/recommended-install-practices.html
2. How many Applications can a single ACM hub handle?
- Cross check this against the #1-4 above
3. How many Policies can a single ACM hub handle?
- Cross check this against the #1-4 above
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- A documentation topic should be created specifically catering to the Capacity Planning aspects
- A blog should be produced to further explore this topic in a user-friendly readable format
- A "spreadsheet" calculator of some sort that can accept input of clustersNum + nodesNUm + policiesNum + applicationsNum and provide recommended hub suzing output of clusterSize, cpuNum, memNum
- Hub should be a 3-node control plane, ACM hub running on dedicated infrastructure nodes
Dependencies (internal and external)
- OCP, k8s - we need to understand this capacity plan within the context of kubernetes resources generically
Previous Work (Optional):
- …
Open questions:
- We think this is Obs+SRE squad members?
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>