-
Bug
-
Resolution: Done
-
Major
-
Global Hub 1.7.0
-
Quality / Stability / Reliability
-
1
-
False
-
-
False
-
-
-
GH Train-35
-
Low
-
None
Problem Statement
Three integration tests were failing intermittently due to race conditions during test cleanup, causing random CI failures.
Version Found
Latest main branch (commit b1d3db00)
Is it reproducible?
Intermittent - occurs during test cleanup when resources are being deleted
Steps to Reproduce
- Run integration tests: make integration-test/agent and make integration-test/operator
- Tests occasionally fail during cleanup phase with race conditions
Actual Results
Test 1: Migration ConfigMap Conflict
- Error: configmaps "multicluster-global-hub-agent-sync-state" already exists
- Location: test/integration/agent/migration/migration_*_test.go
- Test Result: 1 Failed
Test 2: Manager Reconciler Panic
- Error: runtime error: invalid memory address or nil pointer dereference at manager_reconciler.go:211
- Location: operator/pkg/controllers/manager/manager_reconciler.go
- Stacktrace shows nil MGH object access in defer function
Test 3: Transport Offset Empty String
- Error: Expected <string>: "" To satisfy matchers [test-topic-1, test-topic-2]
- Location: test/integration/manager/status/transport_offset_test.go
- Caused by querying ALL transport records instead of specific test data
Expected Results
All integration tests should pass reliably without race condition failures.
Root Causes
1. Migration ConfigMap Conflict
- Two test suites (migration_from_syncer_test and migration_to_syncer_test) share global AgentConfig
- Both try to create the same configmap multicluster-global-hub-agent-sync-state
- No cleanup between test runs causes "already exists" errors
2. Manager Reconciler Panic
- During test cleanup, MGH resource gets deleted while controller is still reconciling
- Defer function at line 208-217 tries to update status using mgh.Namespace
- MGH is nil after deletion, causing nil pointer dereference
3. Transport Offset Query
- Test used db.Find(&positions) to query ALL transport records
- Included old format records from migration tests without @partition suffix
- Splitting by @ produces empty strings, failing assertions
Resolution
All three issues fixed by ensuring proper test isolation:
Migration Tests (PR #2184)
- Delete existing configmap in BeforeAll to ensure clean state
- Use local namespace constants in AfterAll instead of global config
Manager Reconciler (PR #2184)
- Add nil check in defer function before accessing MGH object
- Skip status update gracefully if MGH was deleted
Transport Offset Test (PR #2185)
- Query only the 4 specific records created by the test using WHERE clause
- Add assertion to verify exactly 4 records found
Related PRs
Test Results
Before
- Migration tests: 18 Passed, 1 Failed (random)
- Manager integration: Panic during cleanup
- Status tests: 36 Passed, 1 Failed (random)
After
- Migration tests: 19/19 Passed

- Manager integration: No panic

- Status tests: 37/37 Passed

Additional Information
- Severity: Medium (affects CI reliability but not production)
- All fixes follow defensive programming best practices
- Common theme: Ensure test isolation by avoiding shared global state
Generated with Claude Code
- clones
-
ACM-25280 Fix postgres certificate error: x509 SAN dNSName is malformed
-
- Closed
-