Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Global Hub
Labels:
- GlobalHub

Activity Type:
Quality / Stability / Reliability
Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
GH Train-35

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

When manager tries to insert event data at the beginning of a new month, it fails with a partition not found error. This occurs when the operator restarts between the month boundary (e.g., Dec 28-31) after the data retention cron job has created the next month's partition.

The affected tables include:
- event.local_policies
- event.local_root_policies
- event.managed_clusters
- history.local_compliance

Version-Release number of selected component (if applicable):

Global Hub 1.7.0

How reproducible:

Reproducible when operator restarts occur between the cron job execution (28th of month) and the end of month, followed by data insertion in the new month.

Steps to Reproduce:

Wait for data retention cron job to execute on Dec 28, 00:00 (creates 2026_01 partition)
Restart operator between Dec 29-31 (operator only recreates 2025_12 and 2025_11 partitions, 2026_01 is lost)
Wait for month rollover to January
Cron job executes on Jan 1, 00:00 (creates 2026_02 but does not check for 2026_01)
Attempt to insert event data with created_at timestamp in January

Actual results:

Manager fails to insert event data with error:

2026/01/05 03:43:45 /workspace/manager/pkg/status/handlers/policy/local_replicated_policy_event_handler.go:122 
pq: no partition of relation "local_policies" found for row

[2.912ms] [rows:0] INSERT INTO "event"."local_policies" 
("event_name","event_namespace","policy_id","message","leaf_hub_name","reason","count","source","compliance","cluster_id","cluster_name","created_at") 
VALUES ('policies.policy-subscriptions.1887b6f55cb89eaf','local-cluster','f830746d-9308-47c9-a1e1-efeb8790fb42',...,'2026-01-05 03:09:03')

2026-01-05T03:43:45.755Z WARN workerpool/worker.go:131 
failed to handle event (event.localreplicatedpolicy): failed handling leaf hub LocalPolicyStatusEvent event - 
pq: no partition of relation "local_policies" found for row

The current month partition (2026_01) is missing from the database.

Expected results:

The current month partition should exist and data insertion should succeed. The system should handle operator restarts near month boundaries gracefully without losing partition tables.

Additional info:

Root Cause:
The system has a design flaw in partition management with two components creating partitions:
1. Operator Initialization: Creates current month + previous month partitions (only on operator start/restart)
2. Data Retention Cron Job: Creates next month partition only (executes 1st, 15th, 28th at 00:00)

When operator restarts between Dec 28-31, it recreates only 2025_12 and 2025_11, potentially losing the 2026_01 partition created by the cron job.

Solution Implemented:
1. Added ensurePartitionExists() function to check and create current month partition if missing
2. Modified data retention job to ensure current month partition exists before creating next month
3. Updated manager startup to always run data-retention job, ensuring partition integrity on every restart

PR: https://github.com/stolostron/multicluster-global-hub/pull/2221

🤖 Generated with Claude Code

is related to

ACM-27978 Use PostgreSQL trigger to replace data retention job logic

Assignee:: Meng Yan

Reporter:: Meng Yan

QA Contact:: Yaheng Liu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/05 5:59 AM

Updated:: 2026/01/06 6:58 AM

Resolved:: 2026/01/06 6:58 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates