-
Feature
-
Resolution: Unresolved
-
Medium
-
None
-
Product / Portfolio Work
-
False
-
-
False
-
Not Selected
-
-
100% To Do, 0% In Progress, 0% Done
-
Yes
-
0
Feature Overview (aka. Goal Summary)
Enable Technology Preview deployment of Confidential Containers with GPU support on OpenShift bare metal clusters using NVIDIA H100 GPU chips with NVIDIA Confidential Computing capabilities.
This feature extends Confidential Containers' hardware-based Trusted Execution Environment (TEE) protections to GPU-accelerated workloads, protecting sensitive data and models during GPU computation through hardware-based memory encryption and attestation.
The Technology Preview provides early access to confidential GPU computing for AI/ML and HPC workloads, with potential expansion to NVIDIA B200 GPUs pending hardware availability for testing and validation.
Goals (aka. expected user outcomes)
Primary User Personas: AI/ML Engineers, Data Scientists, Security Engineers, Platform Administrators managing GPU workloads, Compliance Officers for sensitive compute workloads
Observable Functionality:
- Platform administrators can deploy and manage Confidential Containers with GPU acceleration on bare metal OpenShift clusters equipped with NVIDIA H100 GPUs with Technology Preview support
- AI/ML engineers can run GPU-accelerated inference and training workloads within TEEs, protecting proprietary models and sensitive training data
- Security engineers can enforce hardware-based GPU memory isolation and attestation policies for sensitive compute workloads
- Data scientists can leverage GPU acceleration for confidential computing use cases without modifying existing containerized GPU applications
- Compliance teams can demonstrate GPU workload protection meeting regulatory requirements for sensitive data processing
Expanded Features:
- Extends existing Confidential Containers TEE capabilities to GPU-accelerated workloads
- Integrates NVIDIA Confidential Computing features with OpenShift
- Provides foundation for future GPU confidential computing enhancements based on Tech Preview feedback
- Enables exploration of confidential AI/ML workload patterns on OpenShift
Requirements (aka. Acceptance Criteria)
Functional Requirements:
- Support for NVIDIA H100 GPU chips with NVIDIA Confidential Computing capabilities on bare metal hardware
- Exploratory support for NVIDIA B200 GPUs (subject to hardware availability for testing and validation)
- GPU memory encryption during computation within the TEE
- Attestation and verification of GPU TEE integrity before workload deployment
- Integration with OpenShift NVIDIA Operator for GPU resource management
- Support for GPU passthrough to confidential containers
- RuntimeClass configuration for GPU-enabled confidential workloads
Non-Functional Requirements:
- Security: Hardware-enforced GPU memory encryption, GPU attestation, protection of models and data during GPU computation, secure key management for GPU encryption
- Usability: Clear Technology Preview limitations documentation, integration with existing GPU workflows, example configurations for common AI/ML frameworks
- Supportability: Technology Preview support level with clear feedback channels, known limitations documented, troubleshooting guidance for GPU-specific scenarios
Documentation Considerations
Required Documentation:
Installation Guide:
- Hardware prerequisites (specific NVIDIA H100 GPU models, supported server platforms)
- NVIDIA driver and firmware requirements for Confidential Computing
- NVIDIA GPU operator configuration alongside Confidential Containers operator
- Hardware detection and validation procedures for GPU TEE capabilities
Administrator Guide:
- Cluster configuration for GPU-enabled confidential computing
- Troubleshooting GPU-specific TEE issues
- Technology Preview limitations and workarounds
Developer Guide:
- RuntimeClass configuration for GPU-enabled confidential containers
- GPU attestation verification
- Framework compatibility (TensorFlow, PyTorch, CUDA applications)
- Migration path from standard GPU workloads to confidential GPU workloads
Architecture Documentation:
- GPU confidential computing component overview
- Integration between Confidential Containers and NVIDIA Confidential Computing
- GPU memory encryption and attestation flow diagrams
- Security model for GPU workloads in TEEs
- Comparison with standard GPU workload architecture
Hardware Compatibility Matrix:
- Supported NVIDIA H100 GPU models and configurations
- Known hardware limitations and compatibility issues
Technology Preview Statement:
- Explicit Technology Preview scope and limitations
- Features not yet supported (if any)
Release Notes:
- Technology Preview feature highlights
- Supported GPU models (H100, B200 status)
- Known limitations and issues
- Compatibility with existing Confidential Containers deployments