Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2713

Confidential Clusters with remote attestation - Phase III

XMLWordPrintable

    • Product / Portfolio Work
    • OCPSTRAT-2023OpenShift Confidential Clusters
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview (aka. Goal Summary)  

      Elevate the Confidential Cluster Operator from Developer Preview to Technology Preview status, delivering production-ready quality, comprehensive documentation, enhanced observability, and API stability for OpenShift confidential clusters on Microsoft Azure with AMD SEV-SNP.

      This feature represents a significant maturity leap with improved operational tooling, OpenShift Console integration, performance optimization, and expanded testing, while maintaining focus on the validated Azure/AMD SEV-SNP platform.

      Technology Preview enables broader customer adoption, establishes support processes, and provides the foundation for General Availability while customers gain confidence deploying confidential workloads at scale.

      Goals (aka. expected user outcomes)

      Primary User Types/Personas:

      • Enterprise Customers (Financial Services, Healthcare, Government): Can deploy confidential clusters in pre-production and staging environments with confidence in stability, performance, and supportability approaching production standards
      • Cluster Administrators: Have comprehensive observability through OpenShift Console, robust troubleshooting tools, and operational confidence managing confidential cluster lifecycle
      • Security & Compliance Teams: Can validate and audit attestation evidence, generate compliance reports, and integrate confidential clusters into security tooling
      • OpenShift Site Reliability Engineers (SREs): Have monitoring, alerting, and diagnostic capabilities to operate confidential clusters at scale
      • Red Hat Support Teams: Can effectively troubleshoot customer issues with mature diagnostic tools, runbooks, and escalation paths
      • Field Engineers & Solution Architects: Can confidently recommend confidential clusters for near-production use cases with predictable performance and operational characteristics

      Observable Functionality:

      • All Developer Preview functionality from Phase II, now with production-quality implementation
      • OpenShift Console UI displays attestation status, node confidential state, and operator health
      • Observability dashboards provide attestation performance visibility
      • Alert rules notify administrators of attestation failures and operator issues
      • API/CRD stability with versioning and deprecation policies
      • Improved error handling with actionable remediation guidance
      • Automated recovery from transient attestation failures

       Requirements (aka. Acceptance Criteria):

      Functional Requirements:

      1. Production-Quality Operator Implementation
        • Operator follows all OpenShift operator best practices and coding standards
        • Operator handles all error conditions gracefully with retry logic and backoff
        • Operator upgrade tested with no downtime to existing confidential clusters
        • Resource requests/limits properly configured for production workloads
      2. OpenShift Console Integration
        • Console overview page shows confidential cluster status and attestation health
        • Node details page displays SEV-SNP enabled state and attestation status
        • Operator details page shows configuration and operational status
        • Visual indicators for attestation failures with drill-down to details
        • Console actions for common operations (view attestation logs, refresh status)
      3. Enhanced Observability & Monitoring
        • Integration with OpenShift cluster monitoring operator
      4. Improved Operational Tooling
        • Enhanced must-gather plugin for confidential clusters:
          • All operator logs and CRD states
          • Node SEV-SNP configuration and firmware versions
          • Attestation attempt history and evidence
          • Trustee service connectivity diagnostics
          • Azure confidential VM metadata
        • Automated health checks and validation tools
        • Log aggregation and structured logging for troubleshooting
      5. API Stability & Versioning
        • CRD API version v1beta1 or v1 with stability guarantees
        • Clear deprecation policy for API changes
        • Backward compatibility maintained within Tech Preview
        • CRD field descriptions comprehensive and accurate
      6. Enhanced Installation Experience
        • Installation documentation covers all Azure regions and VM SKUs
        • Pre-flight validation checks before installation begins
        • Support for custom network configurations (VNet, subnet requirements)
        • Day 2 operations documented
        • Migration guide from Developer Preview to Tech Preview

       

       

      Deployment considerations List applicable specific needs (N/A = not applicable)
      Self-managed, managed, or both Self-managed primary; ARO compatibility validated but not officially supported in Tech Preview; document ARO requirements for GA
      Classic (standalone cluster) Yes - fully supported and primary deployment model 
      Hosted control planes Still not supported; architecture refinements documented for future HyperShift integration
      Multi node, Compact (three node), or Single node (SNO), or all Multi-node (4+ nodes) fully supported; Compact (3-node) fully supported; SNO evaluated with Tech Preview timeline decision
      Connected / Restricted Network Both supported
      Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) x86_x64 only (Azure AMD SEV-SNP confidential VMs); other architectures explicitly not supported
      Operator compatibility Dependencies: Machine API, Machine Config Operator, Cluster Version Operator; OLM integration for Tech Preview
      Backport needed (list applicable versions) N/A - new capability targeting next OpenShift minor release (e.g., 4.X) 
      UI need (e.g. OpenShift Console, dynamic plugin, OCM) OpenShift Console integration required 
      Other (please specify)  

      Out of Scope

      Explicitly Not Supported in Tech Preview:

      • Other Cloud Providers:  AWS and GCP support will be scoped during Phase IV
      • Other TEE Technologies: Intel TDX, ARM CCA, other AMD technologies not supported
      • Managed Services: ARO (Azure Red Hat OpenShift), ROSA, OSD integration
      • Hosted Control Planes: HyperShift/hypershift integration
      • Production Support: Tech Preview has defined but limited support; not full production SLA
      • Upgrade Paths: No defined upgrade from Dev Preview to Tech Preview
      • Advanced Observability: Console UI, Prometheus dashboards, alerts

      Background

      Phase Progression Context:

      • Phase I (Complete): Established architecture, upstream repository, technical socialization
      • Phase II (Complete): Developer Preview delivered first working implementation on Azure with AMD SEV-SNP; validated architecture with 5-10 early customers; collected critical feedback on installation, operations, and performance
      • Phase III (This Phase): Technology Preview elevates quality to production-adjacent level; addresses Developer Preview feedback; adds Console UI, monitoring, and operational tooling; establishes GA readiness

      Documentation Considerations

      Complete Product Documentation Required:

      1. Planning & Architecture
        • Solution overview and confidential computing concepts
        • Security architecture and threat model
      2. Installation & Configuration
        • Prerequisites checklist with validation commands
        • Azure subscription preparation (quotas, permissions, resources)
        • Configuration reference for all CRDs and parameters
        • Custom network configuration scenarios
        • Troubleshooting installation failures
      3. Operations & Administration
        • Day 2 operations guide
        • Monitoring and alerting configuration
        • Node lifecycle management (add, remove, replace, maintain)
        • Attestation policy management
        • Upgrading confidential clusters
      4. Troubleshooting & Support
        • Common error messages and solutions
        • Diagnostic commands and data collection
      5. Release Information
        • Release notes with new features and bug fixes
        • Known limitations and unsupported scenarios
        • Tech Preview support policy

      Customer Considerations

      Customer Expectations for Tech Preview:

      • Near-Production Quality: Stability, performance, and supportability significantly better than Developer Preview
      • Comprehensive Documentation: Able to deploy and operate without constant engineering support
      • Defined Support: Clear support boundaries and response expectations
      • API Stability: APIs won't break during Tech Preview; migration path to GA defined
      • Production Planning: Confidence to plan production deployment for GA timeframe

              mak.redhat.com Marcos Entenza Garcia
              mak.redhat.com Marcos Entenza Garcia
              None
              Clement Verna, Nitesh Narayan Lal
              Timothée Ravier Timothée Ravier
              Yalan Zhang Yalan Zhang
              Avani Bhatt Avani Bhatt
              Kyle Walker Kyle Walker
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: