Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: Global Hub 1.7.0
Affects Version/s: Global Hub 1.7.0
Component/s: Global Hub
Labels:
- GlobalHub

Activity Type:
Quality / Stability / Reliability
Story Points:
0.5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
GH Train-34, GH Train-35

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

The integration test "should delete the ServiceMonitor when mgh deleted" was flaky with a 75% pass rate. The test failed intermittently due to a race condition caused by the two-phase cleanup behavior in the manager reconciler's pruneResources function.

Version-Release number of selected component (if applicable):

Global Hub 1.7.0

How reproducible:

Intermittent - approximately 75% pass rate before fix

Steps to Reproduce:

Run the integration test "should delete the ServiceMonitor when mgh deleted"
The test calls reconcile() once after deleting the MulticlusterGlobalHub instance
Observe that ServiceMonitor deletion may not occur in the first reconciliation

Actual results:

The test failed intermittently because:
- Phase 1: pruneResources deletes ManagedClusterMigrations and returns early
- Phase 2: On next reconciliation (automatic rescheduling), ServiceMonitor gets deleted
- The test only called reconcile() once and relied on controller's automatic rescheduling, creating a race condition

Expected results:

ServiceMonitor should be deleted in a single reconciliation call, eliminating the need for multiple reconciliation cycles.

Additional info:

Root cause: Early return in pruneResources prevented ServiceMonitor cleanup when migrations existed
Fix: Modified pruneResources to delete both migrations and ServiceMonitor in a single reconciliation call (PR #2131)
Impact: Integration test flakiness and inefficient resource cleanup (two reconciliations instead of one)
Test results after fix: 4/4 test runs passed (100% success rate)

🤖 Generated with Claude Code

Assignee:: Meng Yan

Reporter:: Meng Yan

QA Contact:: Yaheng Liu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/12/03 2:49 AM

Updated:: 2025/12/08 12:49 PM

Resolved:: 2025/12/08 12:49 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates