-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
User Story
As a hosted control plane operator, I want to optionally enable persistent volume storage for audit logs so that I can capture and retain audit events before OOM events occur, enhancing observability and debugging capabilities.
Background
This story enhances observability for hosted control planes by providing an opt-in mechanism to persist audit logs to persistent volumes. While independent of the dynamic scaling functionality in OCPSTRAT-1896, this feature complements the overall observability improvements by ensuring critical audit data is preserved, especially during resource pressure scenarios that might lead to OOM events.
Problem Statement
Currently, audit logs from the Kube APIServer in hosted control planes may be lost during OOM events or control plane restarts, making it difficult to diagnose what led to the failure. Having persistent storage for audit logs would provide crucial forensic data for troubleshooting.
Acceptance Criteria
- Implement an opt-in configuration option to enable audit log persistence to persistent volumes
- Ensure the feature works with existing Kubernetes audit log configurations and policies
- Storage should be sufficient for rotating audit logs from the Kube APIServer
- The feature should be disabled by default (opt-in only)
- Provide clear documentation for setup and configuration of the feature
- Ensure minimal performance impact on control plane operations
- Support various persistent volume types (AWS EBS, Azure Disk, etc.)
Technical Considerations
- Integration with existing audit log policies and configurations
- Proper log rotation and storage management
- Security compliance for persistent audit storage
- Performance overhead minimization during control plane operations
Documentation Requirements
- Clear setup and configuration guide
- Examples of enabling the feature
- Best practices for storage sizing and rotation policies
- Troubleshooting guide for common issues
Parent Story
This story is related to OCPSTRAT-1896 (Dynamic Scaling for Hosted Control Planes) as an independent feature that enhances observability.
Definition of Done
- Feature is implemented and tested
- Documentation is complete and reviewed
- Feature can be enabled/disabled via configuration
- No significant performance regression
- Integration tests pass