-
Story
-
Resolution: Unresolved
-
Major
-
Global Hub 1.6.0
Migration is based on Kafka to transmit resources. While Kafka's built-in persistence functionality ensures data won't be lost, data inconsistency can occur under certain boundary conditions.
Problem: Cross Migration Session Inconsistency
When the agent's consumer group ID changes, the agent may re-consume migration events from previous migration sessions, leading to incorrect operations such as cleaning resources that shouldn't be cleaned.
Solution
To address this issue, we propose the following improvements:
1. Add Expiration Time to Migration Events
~Each migration event should have an expiration time (based on stage timeout settings)- The agent will not consume expired migration events
2. Persist Migration Processing State
- Save not only the current migration ID being processed by the agent
- Also persist the start time of the migration ID processing to the ConfigMap `multicluster-global-hub-agent-sync-state`
- For migration events that haven't expired yet, they can still be filtered by this start time
- ConfigMap
is empty: 10 min
not empty: based on the cached timestamp
3. Use UTC for Time Consistency
- All timestamps should use UTC to ensure time consistency between agent and manager
4. Improve Migration ID Initialization Logic
- Use migration ID to ensure processing the current migration ID
- Previously, the migration ID was set when encountering the Validating stage (i.e., the beginning phase of a session)
- With the addition of timestamp, we can now set the current processing migration based on both time and Validating stage factors
- This approach avoids initializing the current migration ID to an already-processed migration session
Other Cases:
- Crash 1 day and restart -> Rollback to get to work
- Kafka restart
Related Issues
This story, when completed, will also fix the problem encountered in ACM-23920.