-
Story
-
Resolution: Done
-
Major
-
Global Hub 1.7.0
-
Product / Portfolio Work
-
False
-
-
False
-
Not Selected
-
-
-
None
User Story
As a Global Hub administrator, I want to configure explicit Kafka message retention policies so that old events are automatically cleaned up, preventing unlimited storage growth and associated memory issues.
Problem
Currently, the built-in Kafka topics are configured with only cleanup.policy: compact without explicit retention policies. This causes:
- Events to be kept indefinitely without automatic time-based or size-based deletion
- The compact policy only removes older versions of the same key, but does not delete old data
- All events accumulate in Kafka without bounds, contributing to memory growth in long-running clusters
Current Configuration
Kafka has two configuration levels:
- Broker Level (global defaults for all topics):
- log.retention.ms (Kafka default: 7 days)
- log.retention.bytes (Kafka default: unlimited)
- log.cleanup.policy (Kafka default: delete)
- Location: operator/pkg/controllers/transporter/protocol/strimzi_transporter.go:802-819
- Currently: Only replication settings configured, no retention parameters set (uses Kafka defaults)
- Topic Level (overrides broker defaults):
- Location: operator/pkg/controllers/transporter/protocol/strimzi_transporter.go:635-637
- Currently: cleanup.policy: compact only
- No retention.ms configured
- No retention.bytes configured
- Topic config overrides broker defaults
The Issue: Even though Kafka broker has a default 7-day retention, the topic-level cleanup.policy: compact (without delete) makes the broker retention policy ineffective. Data is kept indefinitely.
Impact
- During scale testing ( hours), all events remain in Kafka indefinitely
- Observed memory growth in MCGH namespace (Operator/Agent pods) is related to processing continuously growing Kafka data
- Without cleanup mechanisms, long-running production clusters will face storage exhaustion and performance degradation
- Kafka default 7-day retention does not apply because topic-level cleanup.policy does not include delete
Proposed Solution
Add explicit retention policy at the topic level:
- cleanup.policy: compact,delete
- retention.ms: 86400000 (24 hours, should be configurable)
- retention.bytes: 1073741824 (1GB per partition, optional)
Alternatively, could configure at broker level if all topics should share the same retention policy.
Benefits:
- Data automatically deleted after retention period
- Compaction still works for deduplication efficiency
- Predictable and bounded storage usage
- Prevents memory growth issues related to unlimited Kafka data accumulation
Acceptance Criteria
- Decide whether to configure retention at broker level or topic level (or both with topic overrides)
- Kafka topics configured with both compact and delete cleanup policies
- Default retention time set (recommended: 24-48 hours based on use case)
- Retention parameters (time and bytes) configurable via MulticlusterGlobalHub CR spec
- Configuration validated in scale/longevity testing showing bounded memory growth
- Documentation updated explaining broker vs topic level retention configuration, default retention policies, and how to customize retention settings
- Upgrade path tested for existing deployments
Additional Context
This issue is related to investigating memory growth observed during scale testing in the MCGH namespace. The current configuration with cleanup.policy: compact only means that even Kafka default 7-day retention is not enforced, leading to indefinite data retention.
Generated with Claude Code https://claude.com/claude-code