Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-25051

As a Global Hub Admin, I can ensure the migration state consistency across migration sessions

XMLWordPrintable

    • Product / Portfolio Work
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • GH Train-33
    • None

       

      Migration is based on Kafka to transmit resources. While Kafka's built-in persistence functionality ensures data won't be lost, data inconsistency can occur under certain boundary conditions.

      Problem: Cross Migration Session Inconsistency

      When the agent's consumer group ID changes, the agent may re-consume migration events from previous migration sessions, leading to incorrect operations such as cleaning resources that shouldn't be cleaned.

      Solution

      To address this issue, we propose the following improvements:

      1. Add Expiration Time to Migration Events

      • ~Each migration event should have an expiration time (based on stage timeout settings) 
      • The agent will not consume expired migration events

      2. Persist Migration Processing State

      • Save not only the current migration ID being processed by the agent
      • Also persist the start time of the migration ID processing to the ConfigMap `multicluster-global-hub-agent-sync-state`
      • For migration events that haven't expired yet, they can still be filtered by this start time
      • ConfigMap
           is empty: 10 min
           not empty: based on the cached timestamp 

      3. Use UTC for Time Consistency

      • All timestamps should use UTC to ensure time consistency between agent and manager

      4. Improve Migration ID Initialization Logic

      • Use migration ID to ensure processing the current migration ID
      • Previously, the migration ID was set when encountering the Validating stage (i.e., the beginning phase of a session)
      • With the addition of timestamp, we can now set the current processing migration based on both time and Validating stage factors
      • This approach avoids initializing the current migration ID to an already-processed migration session

      Other Cases: 

      • Crash 1 day and restart -> Rollback to get to work
      • Kafka restart 

       

      Related Issues
      This story, when completed, will also fix the problem encountered in ACM-23920.

              rh-ee-myan Meng Yan
              rh-ee-myan Meng Yan
              Yaheng Liu Yaheng Liu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: