-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
Feature Overview (aka. Goal Summary)
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
The OpenShift Cloud Credential Operator (CCO) currently determines its operating mode (mint, passthrough, or insufficient) only once during installation. When cloud provider credentials are subsequently modified, revoked, or have permissions changed, CCO cannot adapt and remains in a failed state, requiring manual administrator intervention to restore cluster functionality.
Customers would like us to implement automatic mode re-evaluation capabilities that enable CCO to continuously monitor credential capabilities and dynamically switch between operating modes when permission changes are detected. This includes responding to credential sync failures and periodic health assessments.
Doing so eliminates manual recovery procedures, improves cluster self-healing capabilities, reduces operational overhead, and enhances overall system resilience. This directly addresses the operational burden highlighted in RFE-7794 where administrators must manually adjust credentials or change modes when failures occur.
Goals (aka. expected user outcomes)
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
Implement automatic mode re-evaluation capabilities that enable CCO to continuously monitor credential capabilities and dynamically switch between operating modes when permission changes are detected. This includes responding to credential sync failures and periodic health assessments.
Primary Goals
- Automated Recovery: Enable CCO to automatically detect and recover from credential permission changes without manual administrator intervention
- Dynamic Mode Switching: Allow CCO to transition between operating modes (mint, passthrough, insufficient) based on real-time credential capabilities
- Improved Cluster Resilience: Reduce cluster downtime and credential sync failures caused by credential permission changes
- Operational Efficiency: Minimize manual remediation steps required when cloud credentials are rotated, revoked, or have reduced permissions
Secondary Goals
- Accurate Status Reporting: Ensure mode annotations and status reflect actual current capabilities
- Proactive Issue Detection: Identify credential capability degradation before complete failure
- Backward Compatibility: Maintain existing CCO functionality while adding new re-evaluation capabilities
Requirements (aka. Acceptance Criteria):
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Functional Requirements
FR-1: Credential Monitoring
- FR-1.1: CCO must continuously monitor the cloud-credentials Secret for changes
- FR-1.2: CCO must detect permission changes in cloud provider credentials
- FR-1.3: CCO must track credential capability changes over time
FR-2: Mode Re-evaluation Engine
- FR-2.1: CCO must implement a re-evaluation mechanism that can assess current credential capabilities
- FR-2.2: System must support re-evaluation triggers based on:
- Credential sync failures due to permission issues
- Detected changes in credential capabilities
- Periodic health checks (configurable interval)
- FR-2.3: Re-evaluation must follow the same logic as initial mode determination
FR-3: Dynamic Mode Switching
- FR-3.1: CCO must support automatic transitions between all operating modes:
- mint → passthrough
- mint → insufficient
- passthrough → mint
- passthrough → insufficient
- insufficient → mint
- insufficient → passthrough
- FR-3.2: Mode transitions must be atomic and safe
- FR-3.3: System must handle partial credential restoration scenarios
FR-4: Event Handling and Logging
- FR-4.1: CCO must generate appropriate events for mode changes
- FR-4.2: All re-evaluation activities must be logged with appropriate severity levels
- FR-4.3: Failed re-evaluation attempts must be logged with detailed error information
Non-Functional Requirements
NFR-1: Performance
- NFR-1.1: Re-evaluation process must not impact existing CCO performance
- NFR-1.2: Credential checks must be efficient and not overload cloud provider APIs
- NFR-1.3: Re-evaluation frequency must be configurable to balance responsiveness with resource usage
NFR-2: Reliability
- NFR-2.1: Re-evaluation mechanism must be robust and handle cloud provider API failures gracefully
- NFR-2.2: System must prevent infinite re-evaluation loops
- NFR-2.3: Failed re-evaluations must not affect current cluster operations
NFR-3: Security
- NFR-3.1: Credential validation must maintain existing security standards
- NFR-3.2: Re-evaluation process must not expose sensitive credential information in logs
- NFR-3.3: Mode transitions must preserve security boundaries
NFR-4: Compatibility
- NFR-4.1: Feature must be backward compatible with existing CCO deployments
- NFR-4.2: Must support all currently supported cloud providers
- NFR-4.3: Existing APIs and interfaces must remain unchanged
Acceptance Criteria
AC-1: Credential Permission Revocation Recovery
- Given: CCO is operating in mint mode with full permissions
- When: Cloud provider credentials have mint permissions revoked but retain passthrough permissions
- Then: CCO automatically detects the change and transitions to passthrough mode within configurable time limit
- And: All dependent components continue operating without manual intervention
AC-2: Credential Restoration Detection
- Given: CCO is operating in insufficient mode due to limited permissions
- When: Cloud provider credentials are updated with mint permissions
- Then: CCO automatically detects the enhanced permissions and transitions to mint mode
- And: Mode annotation is updated to reflect new capabilities
AC-3: Credential Sync Failure Handling
- Given: CCO is experiencing credential sync failures due to permission issues
- When: Re-evaluation is triggered by repeated sync failures
- Then: CCO assesses current credential capabilities and adjusts mode accordingly
- And: Appropriate events and logs are generated documenting the mode change
AC-4: Configurable Re-evaluation Frequency
- Given: Administrator needs to tune re-evaluation frequency
- When: Re-evaluation interval is configured via operator configuration
- Then: CCO respects the configured interval for periodic credential checks
- And: Emergency re-evaluation still occurs on sync failures regardless of interval
AC-5: Prevention of Re-evaluation Loops
- Given: CCO detects inconsistent credential states
- When: Multiple re-evaluation attempts occur within a short timeframe
- Then: CCO implements backoff strategy to prevent excessive API calls
- And: System stabilizes without impacting cluster operations
AC-6: Multi-Cloud Provider Support
- Given: Cluster is deployed on any supported cloud provider (AWS, Azure, GCP, etc.)
- When: Credential permissions change on that cloud provider
- Then: Re-evaluation mechanism works consistently across all supported platforms
- And: Provider-specific permission models are correctly handled
AC-7: Backward Compatibility
- Given: Existing OpenShift cluster with current CCO implementation
- When: Cluster is upgraded to include automatic re-evaluation feature
- Then: All existing functionality continues to work without modification
- And: New re-evaluation capabilities are automatically enabled
This comprehensive requirements specification addresses the core need identified in RFE-7794 while ensuring robust, scalable, and maintainable implementation of automatic CCO mode re-evaluation.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Self-managed (Managed would need to be implemented by Managed Services) |
Classic (standalone cluster) | Yes |
Hosted control planes | N/A |
Multi node, Compact (three node), or Single node (SNO), or all | All |
Connected / Restricted Network | Yes |
Architectures, e.g. x86_x64, Arm (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All, wherever applicable (x86, Arm) |
Operator compatibility | See Interoperability Considerations below |
Backport needed (list applicable versions) | Nice-to-have |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | TBD |
Other (please specify) |
Use Cases (Optional):
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Questions to Answer (Optional):
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
Out of Scope
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Background
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Customer Considerations
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Documentation Considerations
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Interoperability Considerations
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Cloud Provider Compatibility
Multi-Cloud Provider Support and Compatibility
- Ensure consistent behavior across AWS, Azure, GCP, OpenStack, vSphere, and other supported platforms
- Account for provider-specific permission models and API rate limiting differences
- Handle varying credential refresh patterns and authentication mechanisms across providers
- Maintain compatibility with different cloud provider API versions and deprecation cycles
- Handle provider-specific credential validation methods and permission checking APIs
- Support regional differences in cloud provider service availability and API endpoints
OpenShift Version Compatibility
- Ensure backward compatibility with supported OpenShift versions (4.x series)
- Consider forward compatibility for future OpenShift releases
- Handle differences in CRD schemas and API versions across OpenShift versions
Operator Dependencies
- Coordinate with operators that depend on CCO-managed credentials:
- Cluster Image Registry Operator
- Ingress Operator
- Machine API Operator
- Cluster Storage Operator
- Ensure credential transitions don't disrupt dependent operator functionality
- Maintain consistent credential delivery during mode changes
Cluster API Integration
- Ensure compatibility with Cluster API providers that rely on CCO
- Handle credential updates for cluster scaling and node provisioning operations
- Coordinate with machine sets and infrastructure management components
Third-Party Tool Integration
Monitoring and Observability
- Maintain compatibility with existing monitoring solutions (Prometheus, Grafana, etc.)
- Ensure new metrics and events are consumable by third-party monitoring tools
- Preserve existing alerting integrations while adding new re-evaluation alerts
Security and Compliance Tools
- Ensure credential re-evaluation activities are auditable and comply with security scanning tools
- Maintain compatibility with policy engines (OPA/Gatekeeper, Falco, etc.)
- Support integration with external secret management systems (Vault, CyberArk, etc.)
CI/CD and Automation
- Ensure automated deployment pipelines continue functioning during credential transitions
- Maintain compatibility with GitOps tools (ArgoCD, Flux) that may depend on credential state
- Support integration with external credential rotation systems
Upgrade and Migration Scenarios
Rolling Upgrades
- Ensure smooth operator upgrades without credential service disruption
- Handle mixed-version scenarios during cluster upgrades
- Maintain credential continuity during CCO operator updates
Multi-Cluster Environments
- Consider behavior in hub-spoke architectures (Red Hat Advanced Cluster Management)
- Ensure consistent credential management across federated clusters
- Handle credential synchronization in multi-cluster deployments
Disaster Recovery
- Maintain compatibility with backup and restore procedures
- Ensure re-evaluation capabilities work correctly after cluster recovery
- Support credential restoration in disaster recovery scenarios
Network and Infrastructure Considerations
Network Policies and Firewalls
- Ensure re-evaluation traffic is compatible with existing network policies
- Handle scenarios where cloud provider API access is restricted or proxied
- Consider impact on clusters with limited internet connectivity
Proxy and Air-Gapped Environments
- Maintain functionality in corporate proxy environments
- Support disconnected/air-gapped installations with limited cloud provider access
- Handle credential validation in environments with restricted egress
Security and Compliance Integration
RBAC and Permission Systems
- Ensure new re-evaluation capabilities respect existing RBAC configurations
- Maintain compatibility with custom security contexts and pod security standards
- Support integration with external authentication systems (LDAP, SAML, OIDC)
Audit and Compliance
- Ensure re-evaluation events are properly audited and logged
- Maintain compatibility with compliance frameworks (SOC2, FedRAMP, etc.)
- Support integration with SIEM systems for security event correlation
Data Persistence and State Management
Configuration Management
- Ensure compatibility with existing cluster configuration backup/restore tools
- Maintain state consistency during etcd operations and cluster migration
- Support configuration drift detection and remediation tools
Custom Resource Management
- Maintain backward compatibility with existing CCO custom resources
- Ensure proper handling of custom credential configurations
- Support migration of existing credential configurations to new re-evaluation model
Performance and Scalability
Large-Scale Deployments
- Consider impact on clusters with hundreds of namespaces and credential requests
- Ensure re-evaluation doesn't create performance bottlenecks in large environments
- Maintain compatibility with high-availability and multi-zone deployments
Resource Constraints
- Ensure functionality works within resource-constrained environments
- Maintain compatibility with edge computing and IoT deployments
- Support clusters with limited CPU and memory resources
These interoperability considerations ensure the automatic mode re-evaluation feature integrates seamlessly with the broader OpenShift ecosystem while maintaining compatibility with existing tools, processes, and deployment patterns that organizations rely on.
- blocks
-
RFE-7794 Automatic Mode Re-Evaluation in Cloud Credential Operator (CCO)
-
- Approved
-