Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: Global Hub 1.7.0
Affects Version/s: Global Hub 1.7.0
Component/s: Global Hub
Labels:
- GlobalHub

Activity Type:
Quality / Stability / Reliability
Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
GH Train-35
Severity:
Low

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Problem Statement

Three integration tests were failing intermittently due to race conditions during test cleanup, causing random CI failures.

Version Found

Latest main branch (commit b1d3db00)

Is it reproducible?

Intermittent - occurs during test cleanup when resources are being deleted

Steps to Reproduce

Run integration tests: make integration-test/agent and make integration-test/operator
Tests occasionally fail during cleanup phase with race conditions

Actual Results

Test 1: Migration ConfigMap Conflict

Error: configmaps "multicluster-global-hub-agent-sync-state" already exists
Location: test/integration/agent/migration/migration_*_test.go
Test Result: 1 Failed

Test 2: Manager Reconciler Panic

Error: runtime error: invalid memory address or nil pointer dereference at manager_reconciler.go:211
Location: operator/pkg/controllers/manager/manager_reconciler.go
Stacktrace shows nil MGH object access in defer function

Test 3: Transport Offset Empty String

Error: Expected <string>: "" To satisfy matchers [test-topic-1, test-topic-2]
Location: test/integration/manager/status/transport_offset_test.go
Caused by querying ALL transport records instead of specific test data

Expected Results

All integration tests should pass reliably without race condition failures.

Root Causes

1. Migration ConfigMap Conflict

Two test suites (migration_from_syncer_test and migration_to_syncer_test) share global AgentConfig
Both try to create the same configmap multicluster-global-hub-agent-sync-state
No cleanup between test runs causes "already exists" errors

2. Manager Reconciler Panic

During test cleanup, MGH resource gets deleted while controller is still reconciling
Defer function at line 208-217 tries to update status using mgh.Namespace
MGH is nil after deletion, causing nil pointer dereference

3. Transport Offset Query

Test used db.Find(&positions) to query ALL transport records
Included old format records from migration tests without @partition suffix
Splitting by @ produces empty strings, failing assertions

Resolution

All three issues fixed by ensuring proper test isolation:

Migration Tests (PR #2184)

Delete existing configmap in BeforeAll to ensure clean state
Use local namespace constants in AfterAll instead of global config

Manager Reconciler (PR #2184)

Add nil check in defer function before accessing MGH object
Skip status update gracefully if MGH was deleted

Transport Offset Test (PR #2185)

Query only the 4 specific records created by the test using WHERE clause
Add assertion to verify exactly 4 records found

Related PRs

PR #2184 - Migration + Manager fixes
PR #2185 - Transport offset fix

Test Results

Before

Migration tests: 18 Passed, 1 Failed (random)
Manager integration: Panic during cleanup
Status tests: 36 Passed, 1 Failed (random)

After

Migration tests: 19/19 Passed
Manager integration: No panic
Status tests: 37/37 Passed

Additional Information

Severity: Medium (affects CI reliability but not production)
All fixes follow defensive programming best practices
Common theme: Ensure test isolation by avoiding shared global state

Generated with Claude Code

clones

ACM-25280 Fix postgres certificate error: x509 SAN dNSName is malformed

Closed

Assignee:: Meng Yan

Reporter:: Meng Yan

QA Contact:: Yaheng Liu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/12/15 8:45 AM

Updated:: 2025/12/17 1:18 AM

Resolved:: 2025/12/17 1:18 AM

Details

Description