Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-10169

[Audit] Splunk-Based Audit Log Display in Quay UI

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • Splunk Audit Log Display
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • To Do

      [Audit] Splunk-Based Audit Log Display in Quay UI

      Overview

      Enable viewing and querying of audit logs directly in the Quay UI when Splunk log forwarding is enabled. Currently, Splunk integration only supports writing logs; this feature adds the ability to read logs back from Splunk for display in the organization and superuser admin panels, maintaining feature parity with Elasticsearch and database-backed audit log storage.

      Context

      Quay 3.9 introduced Splunk as an audit log forwarding target (PROJQUAY-4993), but unlike Elasticsearch and database backends, Splunk integration was implemented as write-only. Users who enable Splunk log forwarding lose the ability to view audit logs in the Quay UI - they must access Splunk directly and write SPL queries to audit their organizations. This is a regression in user experience compared to other log backends and creates friction for organizations that have standardized on Splunk for log aggregation.

      The existing codebase has a well-defined logs model interface (ActionLogsDataInterface) and a complete reference implementation in Elasticsearch (DocumentLogsModel) that can guide the Splunk implementation.

      Reference: PROJQUAY-6209

      Scope

      In Scope

      • Implement Splunk log reading via the Splunk Search API
      • Support lookup_logs() for paginated log retrieval in org/repo/user views
      • Support lookup_latest_logs() for recent log display
      • Support get_aggregated_log_counts() for log statistics
      • Support yield_logs_for_export() for log export functionality
      • Proper pagination through Splunk search results
      • Search/filtering of audit events by performer, repository, and date range
      • Organization-scoped log queries (org admins see only their org's logs)
      • Registry-wide log queries in superuser panel
      • Configuration options for Splunk search parameters (index, sourcetype)

      Out of Scope

      • Changes to Splunk log writing (already implemented)
      • Splunk Cloud-specific integrations (focus on Splunk Enterprise API)
      • Real-time log streaming/tailing
      • Custom SPL query interface in Quay UI
      • Log rotation/archival operations (yield_log_rotation_context)

      Child Stories

      1. Splunk Search API Integration Layer: Implement core Splunk search functionality using splunklib SDK. Create search job, poll for completion, and retrieve results. Handle authentication, SSL, and connection configuration.
      2. Implement lookup_logs() for Splunk: Add paginated log retrieval from Splunk. Build SPL queries with filters for account, performer, repository, and date range. Implement cursor-based pagination using search result offsets or datetime bounds.
      3. Implement lookup_latest_logs() for Splunk: Add recent log retrieval for dashboard displays. Query recent logs (last 30 days) sorted by datetime descending.
      4. Implement get_aggregated_log_counts() for Splunk: Add log statistics aggregation using SPL stats count by queries. Group by kind_id and date for the aggregate logs views.
      5. Implement yield_logs_for_export() for Splunk: Add log export functionality for generating downloadable audit reports. Stream results from Splunk search jobs for large exports.
      6. Field Mapping and Data Transformation: Map Splunk stored fields back to Quay's Log datatype. Convert datetime strings, reconstruct user/performer information, handle metadata JSON parsing.
      7. Configuration Schema Updates: Add new LOGS_MODEL_CONFIG options for Splunk reading: search timeout, result limits, index pattern, and optional search prefix customization.
      8. UI Integration Testing: Validate audit log display in organization logs view, user logs view, repository logs view, and superuser panel. Verify pagination, filtering, and export functionality.
      9. Documentation: Update Splunk integration documentation to describe read capabilities, required Splunk permissions, and configuration options.

      Dependencies

      • Technical:
        • Splunk Enterprise with search API access (existing for write support)
        • splunklib Python SDK (already in requirements for writing)
        • Existing ActionLogsDataInterface and LogsModelProxy infrastructure
        • Reference implementation in DocumentLogsModel (Elasticsearch)
      • Cross-team:
        • Documentation team for Splunk integration guide updates
        • QE for Splunk-specific test scenarios
      • External:
        • Splunk SDK compatibility and API stability
        • Customer Splunk deployments must allow search API access (role permissions)

      Success Criteria

      • [ ] Org admins can view audit logs in UI when Splunk log forwarding is enabled
      • [ ] Superusers can view registry-wide audit logs in superuser panel
      • [ ] Log display has feature parity with Elasticsearch backend (same columns, filters)
      • [ ] Pagination works correctly for large result sets
      • [ ] Date range filtering works correctly across timezones
      • [ ] Performer and repository filtering returns accurate results
      • [ ] Aggregate log counts display correctly in charts/summaries
      • [ ] Log export generates complete, accurate CSV/JSON files
      • [ ] No performance degradation for typical log queries ( second response)
      • [ ] Graceful error handling when Splunk is unavailable

      Technical Approach

      Components Affected

      • data/logs_model/splunk_logs_model.py: Implement all read methods currently raising NotImplementedError
      • data/logs_model/splunk_logs_producer.py: Add search capabilities alongside existing write functionality
      • data/logs_model/datatypes.py: May need minor updates for Splunk-specific pagination tokens
      • util/config/schema.py: Add Splunk read configuration options
      • endpoints/api/logs.py: No changes needed (uses LogsModelProxy abstraction)
      • web/src/: No changes needed (uses existing API endpoints)

      Key Technical Decisions

      • SPL Query Pattern: Use structured SPL with filters rather than raw searches for security and predictability
      • Pagination Strategy: Use offset-based pagination with Splunk's result count/offset parameters
      • Field Storage: Rely on indexed fields (account, performer, repository, kind, datetime) for efficient filtering
      • Search Timeout: Configure appropriate timeout for long-running searches; default 60 seconds
      • Result Limits: Enforce maximum result size to prevent memory issues; use streaming for exports

      SPL Query Examples

      # Basic log lookup
      search index=quay_logs account="myorg" earliest="2024-01-01T00:00:00" latest="2024-01-31T23:59:59" | sort -datetime | head 100
      
      # Aggregated counts
      search index=quay_logs account="myorg" | stats count by kind, datetime | sort datetime
      
      # Performer filter
      search index=quay_logs account="myorg" performer="admin" | sort -datetime
      

      Risks and Mitigations

      • Risk: Splunk search API performance varies widely based on data volume and Splunk cluster size
        Mitigation: Implement configurable timeouts; provide guidance on Splunk index optimization; use efficient SPL patterns
      • Risk: Splunk permissions model may not align with Quay's org-scoped access requirements
        Mitigation: Queries always include account filter; document required Splunk role permissions
      • Risk: Field format differences between write and read (datetime format, JSON escaping)
        Mitigation: Comprehensive field mapping layer; unit tests with real Splunk data samples
      • Risk: Large log exports may timeout or consume excessive memory
        Mitigation: Use streaming exports; implement chunked result retrieval; add export size limits
      • Risk: Splunk SDK version compatibility issues
        Mitigation: Pin SDK version; test against supported Splunk Enterprise versions

      Testing Strategy

      • Unit testing: Mock Splunk search API responses; test field mapping and pagination logic
      • Integration testing: Test against real Splunk instance with sample data
      • API testing: Verify all log endpoints work correctly with Splunk backend
      • UI testing: Validate log views in org settings, repo settings, and superuser panel
      • Performance testing: Measure query times with various data volumes (100, 10K, 100K+ logs)
      • Error handling testing: Simulate Splunk unavailability, timeouts, and permission errors

      Rollout Strategy

      • Feature parity: Must work with existing LOGS_MODEL=splunk configuration
      • Backward compatible: No changes to write behavior or existing configuration
      • Graceful degradation: Clear error messages if Splunk search fails
      • No migration needed: Reads from same index/sourcetype used for writes
      • Rollback plan: If issues occur, users can continue using Splunk UI directly

      Documentation Needs

      • Admin guide: Splunk integration with read capabilities
      • Admin guide: Required Splunk permissions for search API access
      • Admin guide: Splunk index configuration recommendations for performance
      • Troubleshooting: Common Splunk connection and permission issues
      • API reference: Log endpoint behavior with Splunk backend

      Related Work

      • Original Feature: PROJQUAY-6209
      • Splunk Write Support: PROJQUAY-4993 (completed - reference for architecture)
      • Related RFEs: RFE-4331, RFE-4500 (view Splunk logs in Quay UI)
      • Reference Implementation: DocumentLogsModel (Elasticsearch) in data/logs_model/document_logs_model.py

              Unassigned Unassigned
              bcaton@redhat.com Brandon Caton
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: