Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-28008

Simplify Status Conflator Architecture for Better Maintainability

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Global Hub
    • None
    • Future Sustainability
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • None

      Background

      The status conflation subsystem in manager/pkg/status/conflator/ is responsible for merging multiple status update bundles from managed hub clusters before persisting them to the PostgreSQL database. This component is critical for reducing database write operations and managing high-throughput status updates at scale.

      The current architecture consists of three main components:

      1. Dispatcher (dispatcher/dispatcher.go) - Routes bundles between transport consumer, conflator, and handlers
      2. Conflator (conflator/conflator.go) - Merges bundles using different strategies (Complete, Delta, Hybrid elements)
      3. Handlers (handler/bundle_event_handler.go) - Persists conflated bundles to database

      Problem Statement

      The current conflation implementation has accumulated significant technical complexity that impacts maintainability, debuggability, and extensibility:

      1. Code Duplication Across Element Types

      The conflator implements three separate element types (CompleteElement, DeltaElement, HybridElement) with duplicated logic for:

      • Bundle registration and retrieval
      • Priority management
      • Timestamp tracking
      • Bundle state transitions

      Each element type reimplements similar functionality with slight variations, violating the DRY principle and making bug fixes error-prone.

      2. Dual Processing Path Complexity

      The conflator maintains two distinct processing paths:

      • Complete Element Path: For resources requiring full state replacement (ManagedClusterInfo)
      • Delta/Hybrid Element Path: For resources supporting incremental updates (Policies, Events)

      These paths diverge early in processing and require separate handling throughout the codebase, doubling the testing surface and increasing cognitive load.

      3. Synchronization Overhead

      Multiple mutex locks protect shared state across:

      • Priority queue operations (priorityQueue.mu)
      • Element-specific locks (element.lock)
      • Registration map locks (registeredBundlesLock)

      This multi-level locking strategy is prone to contention under high load and makes deadlock analysis difficult.

      4. Underutilized Features

      Several implemented features are not actively used:

      • Statistics collection code (commented out in conflator.go:L127-L135)
      • Complex priority queue implementation using linear scan instead of heap
      • Dependency tracking that could be simplified for current use cases

      Proposed Solution

      A phased refactoring approach to reduce complexity while maintaining backward compatibility:

      Phase 1: Audit and Cleanup (1-2 weeks)

      • Remove dead code and unused statistics collection
      • Document actual usage patterns of Complete vs Delta/Hybrid elements
      • Identify opportunities for consolidation
      • Add comprehensive unit tests for existing behavior

      Phase 2: Unified Element Design (2-3 weeks)

      • Create a single ConflationElement interface with strategy pattern for element-specific behavior
      • Extract common logic into shared base implementation
      • Consolidate mutex usage to reduce lock contention
      • Maintain existing external API for compatibility

      Phase 3: Channel-Based Architecture (2-3 weeks)

      • Simplify dispatcher to be a pure routing component
      • Implement backpressure mechanisms using buffered channels
      • Reduce goroutine count through worker pools
      • Add metrics for conflation queue depth and processing latency

      Scope

      The refactoring will touch the following key files:

      File Current Responsibility Refactoring Impact
      conflator/conflator.go Main conflation engine High - Consolidate element types
      conflator/startable_conflator.go Lifecycle management Low - Interface changes only
      conflator/dependency/dependency_conflator.go Dependency-aware conflation Medium - Simplify dependency logic
      conflator/metadata/metadata_conflator.go Metadata handling Medium - Unified element handling
      dispatcher/dispatcher.go Message routing High - Simplify to pure router
      handler/bundle_event_handler.go Database persistence Low - No changes expected
      manager.go Component coordination Low - Minimal wiring changes
      syncers/managed_hub_status_syncer.go Status sync orchestration Low - No changes expected

      Acceptance Criteria

      • [ ] Code complexity metrics improved (cyclomatic complexity, code duplication)
      • [ ] Element type implementations consolidated to single strategy-based design
      • [ ] Mutex usage reduced to single lock per conflation element
      • [ ] Dead code and unused features removed
      • [ ] Processing latency maintained or improved (benchmark tests)
      • [ ] All existing unit tests pass without modification
      • [ ] All integration tests pass (make integration-test/manager)
      • [ ] All E2E tests pass (make e2e-test-all)
      • [ ] Architecture decision record (ADR) documenting design choices
      • [ ] Code review completed by 2+ team members
      • [ ] Performance testing shows no regression under load

      Non-Goals

      • Changing the external API of the status conflator
      • Modifying database schema or persistence logic
      • Altering CloudEvents message format or transport layer
      • Adding new conflation strategies or element types
      Note

      This description was generated with assistance from Claude Code.

              rh-ee-myan Meng Yan
              rh-ee-myan Meng Yan
              Yaheng Liu Yaheng Liu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: