-
Bug
-
Resolution: Done
-
Major
-
Global Hub 1.7.0
-
Quality / Stability / Reliability
-
1
-
False
-
-
False
-
-
-
GH Train-36
-
Moderate
-
None
Problem
After restarting the manager or hub, migration operations hang indefinitely at Registering or Cleaning stages. The logs show warnings: "MigrationStatus is nil for migrationId: xxx".
Root Causeh1. migrationStatuses is an in-memory map that is lost when manager restarts
After restart, Kafka replays migration status events to the handler
The handler calls SetFinished() but MigrationStatus was nil
SetFinished() silently failed, so GetFinished() always returned false
Controller kept waiting forever → timeout → rollback → fail
Solution
Add lazy initialization of migration status in the status handler by calling AddMigrationStatus() before any status updates.
Impact
- Migration operations fail after any manager restart
- Requires manual intervention to recover
- Affects production stability
Fix Details
PR: https://github.com/stolostron/multicluster-global-hub/pull/2269
Changes: 4 lines added, 1 file changed
Files Modified: manager/pkg/status/handlers/clustermigartion/managedclustermigration_handler.go