-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Product / Portfolio Work
-
-
False
-
-
False
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Feature Overview (aka. Goal Summary)
Elevate the Confidential Cluster Operator from Developer Preview to Technology Preview status, delivering production-ready quality, comprehensive documentation, enhanced observability, and API stability for OpenShift confidential clusters on Microsoft Azure with AMD SEV-SNP.
This feature represents a significant maturity leap with improved operational tooling, OpenShift Console integration, performance optimization, and expanded testing, while maintaining focus on the validated Azure/AMD SEV-SNP platform.
Technology Preview enables broader customer adoption, establishes support processes, and provides the foundation for General Availability while customers gain confidence deploying confidential workloads at scale.
Goals (aka. expected user outcomes)
Primary User Types/Personas:
- Enterprise Customers (Financial Services, Healthcare, Government): Can deploy confidential clusters in pre-production and staging environments with confidence in stability, performance, and supportability approaching production standards
- Cluster Administrators: Have comprehensive observability through OpenShift Console, robust troubleshooting tools, and operational confidence managing confidential cluster lifecycle
- Security & Compliance Teams: Can validate and audit attestation evidence, generate compliance reports, and integrate confidential clusters into security tooling
- OpenShift Site Reliability Engineers (SREs): Have monitoring, alerting, and diagnostic capabilities to operate confidential clusters at scale
- Red Hat Support Teams: Can effectively troubleshoot customer issues with mature diagnostic tools, runbooks, and escalation paths
- Field Engineers & Solution Architects: Can confidently recommend confidential clusters for near-production use cases with predictable performance and operational characteristics
Observable Functionality:
- All Developer Preview functionality from Phase II, now with production-quality implementation
- OpenShift Console UI displays attestation status, node confidential state, and operator health
- Observability dashboards provide attestation performance visibility
- Alert rules notify administrators of attestation failures and operator issues
- API/CRD stability with versioning and deprecation policies
- Improved error handling with actionable remediation guidance
- Automated recovery from transient attestation failures
Requirements (aka. Acceptance Criteria):
Functional Requirements:
- Production-Quality Operator Implementation
- Operator follows all OpenShift operator best practices and coding standards
- Operator handles all error conditions gracefully with retry logic and backoff
- Operator upgrade tested with no downtime to existing confidential clusters
- Resource requests/limits properly configured for production workloads
- OpenShift Console Integration
- Console overview page shows confidential cluster status and attestation health
- Node details page displays SEV-SNP enabled state and attestation status
- Operator details page shows configuration and operational status
- Visual indicators for attestation failures with drill-down to details
- Console actions for common operations (view attestation logs, refresh status)
- Enhanced Observability & Monitoring
- Integration with OpenShift cluster monitoring operator
- Improved Operational Tooling
- Enhanced must-gather plugin for confidential clusters:
- All operator logs and CRD states
- Node SEV-SNP configuration and firmware versions
- Attestation attempt history and evidence
- Trustee service connectivity diagnostics
- Azure confidential VM metadata
- Automated health checks and validation tools
- Log aggregation and structured logging for troubleshooting
- Enhanced must-gather plugin for confidential clusters:
- API Stability & Versioning
- CRD API version v1beta1 or v1 with stability guarantees
- Clear deprecation policy for API changes
- Backward compatibility maintained within Tech Preview
- CRD field descriptions comprehensive and accurate
- Enhanced Installation Experience
- Installation documentation covers all Azure regions and VM SKUs
- Pre-flight validation checks before installation begins
- Support for custom network configurations (VNet, subnet requirements)
- Day 2 operations documented
- Migration guide from Developer Preview to Tech Preview
| Deployment considerations | List applicable specific needs (N/A = not applicable) |
| Self-managed, managed, or both | Self-managed primary; ARO compatibility validated but not officially supported in Tech Preview; document ARO requirements for GA |
| Classic (standalone cluster) | Yes - fully supported and primary deployment model |
| Hosted control planes | Still not supported; architecture refinements documented for future HyperShift integration |
| Multi node, Compact (three node), or Single node (SNO), or all | Multi-node (4+ nodes) fully supported; Compact (3-node) fully supported; SNO evaluated with Tech Preview timeline decision |
| Connected / Restricted Network | Both supported |
| Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_x64 only (Azure AMD SEV-SNP confidential VMs); other architectures explicitly not supported |
| Operator compatibility | Dependencies: Machine API, Machine Config Operator, Cluster Version Operator; OLM integration for Tech Preview |
| Backport needed (list applicable versions) | N/A - new capability targeting next OpenShift minor release (e.g., 4.X) |
| UI need (e.g. OpenShift Console, dynamic plugin, OCM) | OpenShift Console integration required |
| Other (please specify) |
Out of Scope
Explicitly Not Supported in Tech Preview:
- Other Cloud Providers: AWS and GCP support will be scoped during Phase IV
- Other TEE Technologies: Intel TDX, ARM CCA, other AMD technologies not supported
- Managed Services: ARO (Azure Red Hat OpenShift), ROSA, OSD integration
- Hosted Control Planes: HyperShift/hypershift integration
- Production Support: Tech Preview has defined but limited support; not full production SLA
- Upgrade Paths: No defined upgrade from Dev Preview to Tech Preview
- Advanced Observability: Console UI, Prometheus dashboards, alerts
Background
Phase Progression Context:
- Phase I (Complete): Established architecture, upstream repository, technical socialization
- Phase II (Complete): Developer Preview delivered first working implementation on Azure with AMD SEV-SNP; validated architecture with 5-10 early customers; collected critical feedback on installation, operations, and performance
- Phase III (This Phase): Technology Preview elevates quality to production-adjacent level; addresses Developer Preview feedback; adds Console UI, monitoring, and operational tooling; establishes GA readiness
Documentation Considerations
Complete Product Documentation Required:
- Planning & Architecture
- Solution overview and confidential computing concepts
- Security architecture and threat model
- Installation & Configuration
- Prerequisites checklist with validation commands
- Azure subscription preparation (quotas, permissions, resources)
- Configuration reference for all CRDs and parameters
- Custom network configuration scenarios
- Troubleshooting installation failures
- Operations & Administration
- Day 2 operations guide
- Monitoring and alerting configuration
- Node lifecycle management (add, remove, replace, maintain)
- Attestation policy management
- Upgrading confidential clusters
- Troubleshooting & Support
- Common error messages and solutions
- Diagnostic commands and data collection
- Release Information
- Release notes with new features and bug fixes
- Known limitations and unsupported scenarios
- Tech Preview support policy
Customer Considerations
Customer Expectations for Tech Preview:
- Near-Production Quality: Stability, performance, and supportability significantly better than Developer Preview
- Comprehensive Documentation: Able to deploy and operate without constant engineering support
- Defined Support: Clear support boundaries and response expectations
- API Stability: APIs won't break during Tech Preview; migration path to GA defined
- Production Planning: Confidence to plan production deployment for GA timeframe
- links to