Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-21413

Real-Time Data Streaming for Search API

XMLWordPrintable

    • Future Sustainability
    • False
    • Hide

      None

      Show
      None
    • False
    • Green
    • 100% To Do, 0% In Progress, 0% Done

      Feature Overview

      This feature aims to evolve the Red Hat Advanced Cluster Management (ACM) Search functionality from a polling-based data retrieval strategy to a real-time streaming or "watch" mechanism. This will allow for instant updates and a more responsive user experience for ACM administrators, reducing latency and improving the freshness of search results.

      Goals

      This Section: Provide high-level goal statement, providing user context
      and expected user outcome(s) for this feature

      • Enable real-time updates for Search data, reducing the delay between data changes in the backend and their reflection in the Search UI.
      • Improve the responsiveness and user experience of ACM Search by eliminating the need for periodic polling.

      Requirements

      Requirement Notes isMvp?
      CI - MUST be running successfully with test automation This is a
      requirement for ALL features.
      YES
      Release Technical Enablement Provide necessary release enablement details
      and documents.
      YES
      • The Search API (GraphQL) MUST support a mechanism for clients to subscribe to data changes. (MVP)
      • The Search backend (PostgreSQL) MUST provide a reliable method to capture data changes for streaming. (MVP)
      • The streaming mechanism MUST be scalable to handle a significant volume of data changes and concurrent subscriptions without negatively impacting system performance. (MVP)
      • The implementation MUST ensure data consistency and avoid missing or duplicating events.
      • A clear communication protocol MUST be established between the PostgreSQL backend, the GraphQL API, and the clients for real-time updates.

      (Optional) Use Cases

      This Section:

      • Main success scenarios - high-level user stories
        • As an ACM admin, I want to see newly discovered clusters and resources appear in my Search results instantly, without manual refreshes.
        • As an ACM admin, I want to be immediately notified of critical resource state changes (e.g., a cluster going offline) through the Search interface.
        • As an ACM admin, I want to observe real-time updates to policies and their compliance status in the Search view.
      • Alternate flow/scenarios - high-level user stories
        • If a network disruption occurs, the client's subscription should gracefully attempt to re-establish and resynchronize with the data stream.
        • The system should handle potential backpressure if a client is unable to process updates as quickly as they are generated.

      Technical Considerations/Limitations (Non-Expert Level)

      • PostgreSQL CDC Overhead: While PostgreSQL's logical replication is efficient, enabling Change Data Capture (CDC) can introduce some overhead on the database server. This needs careful monitoring and tuning to ensure it doesn't degrade overall database performance, especially for high-volume environments.
      • GraphQL Subscriptions Complexity: GraphQL subscriptions, while powerful, can be more complex to implement and scale than simple queries. They typically require a persistent connection (like WebSockets) and a robust publish-subscribe (pub/sub) system in the backend to manage events and distribute them to subscribed clients. This adds architectural complexity.
      • Network Stability & Reconnection Logic: Real-time streaming relies heavily on network stability. The client-side implementation will need sophisticated reconnection and synchronization logic to handle temporary network outages or server restarts without losing data or creating inconsistencies for the user.
      • Data Volume & Filtering: For very large-scale ACM environments, the raw volume of change events from PostgreSQL could be substantial. Efficient filtering and projection of data at the GraphQL layer will be crucial to avoid overwhelming clients with unnecessary information. The design should consider how to allow clients to subscribe to only the relevant changes.
      • Security & Authorization for Streams: Implementing fine-grained authorization for streaming data can be more challenging than for traditional polling. Ensuring that clients only receive updates for data they are authorized to view is paramount.
      • State Management: For some types of real-time views, maintaining client-side state in sync with the streaming data can be tricky. Considerations for cursor management or event sequencing might be necessary.
      • Error Handling in Streams: Robust error handling within the streaming pipeline (from Postgres to GraphQL to client) is essential to identify and address issues like malformed events, processing failures, or dropped connections.

      Out of Scope

      • Real-time bidirectional communication where clients can send streaming data to the backend (focus is on server-to-client streaming).
      • Replacing all existing polling mechanisms entirely; polling may still be used for initial data loads or less frequently updated data.

      Background, and strategic fit

      Currently, ACM Search relies on a polling mechanism to fetch data, which can lead to delays in reflecting the most up-to-date information. As ACM environments grow in scale and complexity, the need for real-time visibility becomes critical for effective management and rapid response to operational issues. This feature will transform ACM Search into a more dynamic and proactive tool, aligning with the broader strategic goal of providing a comprehensive, real-time view of the managed clusters and resources. This will significantly enhance the debugging experience, enabling administrators to quickly pinpoint and react to changes across their infrastructure.

      Assumptions

      • A suitable GraphQL subscription implementation (e.g., using WebSockets or Server-Sent Events) can be integrated with the existing Search API.
      • PostgreSQL's Change Data Capture (CDC) capabilities (e.g., Logical Decoding/Replication) are robust enough and performant for the anticipated data change volume in ACM.
      • The team has the necessary expertise or can acquire it for implementing and maintaining real-time data pipelines from PostgreSQL to GraphQL.

      Customer Considerations

      • Customers will experience significantly improved responsiveness and data freshness in ACM Search, leading to better operational awareness.
      • The ability to see real-time changes will reduce the time spent debugging and troubleshooting issues, as information will be immediately available.
      • This feature sets the stage for more advanced real-time monitoring and automation features in the future.

      Documentation Considerations

      Questions to be addressed:

      • What educational or reference material (docs) is required to support this product feature?
        • Yes, for users/admins and potentially for developers extending Search.
      • Does this feature have a doc impact?
        • Yes, significant.
      • What concepts do customers need to understand to be successful with real-time Search? (e.g., how streaming works, potential caveats).
      • How do we expect customers will use the feature? For what purpose(s)?
        • To monitor dynamic cluster states, track resource changes, and react to events as they happen.
      • What reference material might a customer want/need to understand the architecture or troubleshoot streaming issues?
      • Is there source material that can be used as reference for the Technical Writer in writing the content?
        • Yes, design documents for the GraphQL subscription implementation and the PostgreSQL CDC strategy.

      What is the doc impact (New Content, Updates to existing content, or Release Note)?

      • New Content:
        • Conceptual documentation explaining real-time Search and its benefits.
        • Guides on how to utilize new streaming capabilities in the UI (if applicable).
        • Troubleshooting guides specific to streaming data.
        • Developer guides if exposing direct GraphQL subscription endpoints.
      • Updates to existing content:
        • Updates to the Search overview to highlight real-time capabilities.
        • Performance considerations and best practices for large-scale environments.
      • Release Note: A release note detailing the new real-time streaming functionality for ACM Search.

              Unassigned Unassigned
              jpadilla@redhat.com Jorge Padilla
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: