Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: Global Hub 1.7.0
Affects Version/s: Global Hub 1.7.0
Component/s: Global Hub
Labels:

Activity Type:
Product / Portfolio Work
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
ACM-26450
Acceptance Criteria:
Hide

Provide the required acceptance criteria using this template.

...
Show
Provide the required acceptance criteria using this template. ...
Intelligence Requested:
Market:

Sprint:
GH Train-34
Severity:
Important

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Value Statement

Currently, during the migration deploying stage, all resources are sent from the source hub to the target hub in a single large event. This approach can cause:

Performance bottlenecks when migrating clusters with many resources
Timeout issues due to large payload sizes
Poor scalability for large-scale migrations
Difficulty in tracking progress and debugging failures

To improve migration scalability and reliability, we should support {}batch sending{} resources during the deploying stage.

Proposed Solution

Instead of sending all resources in one single event, the migration controller should:

Split resources into configurable batch sizes (e.g., 10, 20, 50 resources per batch)
Send multiple smaller events sequentially or in parallel
Provide progress tracking for each batch
Handle batch-level failures gracefully with retry mechanisms

Definition of Done

Core Functionality

[x] Migration controller can split resources into configurable batch sizes
[x] Resources are sent in multiple events instead of a single large event
[x] Each batch is sent sequentially with optional delay between batches

Observability & Monitoring

[x] Batch progress is visible in migration status (e.g., "Batch 3/10 deployed")
[x] Logs clearly indicate which batch is being processed

Benefits

1. {}Scalability{}: Handle migrations with thousands of resources without performance degradation
2. {}Reliability{}: Smaller payloads reduce timeout and network issues
3. {}Observability{}: Better progress tracking and failure isolation
4. {}Resource Efficiency{}: Lower memory usage on both source and target hubs

Development Complete

The code is complete.
Functionality is working.
Any required downstream Docker file changes are made.

~~Tests Automated~~
~~[ ] Unit/function tests have been automated and incorporated into the build.~~
~~[ ] 100% automated unit/function test coverage for new or changed APIs.~~

~~Secure Design~~
~~[ ] Security has been assessed and incorporated into your threat model.~~

~~Multidisciplinary Teams Readiness~~
[ ] Create an informative documentation issue using the Customer Portal Doc template that you can access from [The Playbook](https://docs.google.com/document/d/1YTqpZRH54Bnn4WJ2nZmjaCoiRtqmrc2w6DdQxe_yLZ8/edit#heading=h.9fvyr2rdriby), and ensure doc acceptance criteria is met.
~~[ ] Link the development issue to the doc issue.~~

~~Support Readiness~~
~~[ ] The must-gather script has been updated.~~

clones

ACM-22901 As a clusteradmin, I can custom each stage Timeout in the migration CR

Closed

is cloned by

ACM-22959 As global hub admin, I can see that some cluster migrations succeeded while others failed

Closed

Assignee:: Meng Yan

Reporter:: Meng Yan

QA Contact:: Hui Chen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/08/07 2:41 AM

Updated:: 2025/12/01 2:50 AM

Resolved:: 2025/12/01 2:50 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates