Loading...

Type: Feature
Resolution: Unresolved
Priority: High
Fix Version/s: OSC 1.12
Affects Version/s: None
Component/s: sandboxed-containers, sandboxed-containers-operator, trustee, trustee-operator
Labels:

Activity Type:
Product / Portfolio Work
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Parent Link:
OCPSTRAT-2027OpenShift Confidential Containers
Hierarchy Progress Bar:

75% To Do, 25% In Progress, 0% Done
Release Note Text:

Hide
Package:
Description:

Show
Package: Description:
Release Note Type:
Technology Preview
Release Note Status:
Proposed
Product Documentation Required:
Yes

Sprint:
OSC 1.12 Backlog
Cost of Delay:
0

Target Version:

OSC 1.12

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature Overview (aka. Goal Summary)

Enable Technology Preview deployment of Confidential Containers with GPU support on OpenShift bare metal clusters using NVIDIA H100 GPU chips with NVIDIA Confidential Computing capabilities.

This feature extends Confidential Containers' hardware-based Trusted Execution Environment (TEE) protections to GPU-accelerated workloads, protecting sensitive data and models during GPU computation through hardware-based memory encryption and attestation.

The Technology Preview provides early access to confidential GPU computing for AI/ML and HPC workloads, with potential expansion to NVIDIA B200 GPUs pending hardware availability for testing and validation.

Goals (aka. expected user outcomes)

Primary User Personas: AI/ML Engineers, Data Scientists, Security Engineers, Platform Administrators managing GPU workloads, Compliance Officers for sensitive compute workloads

Observable Functionality:

Platform administrators can deploy and manage Confidential Containers with GPU acceleration on bare metal OpenShift clusters equipped with NVIDIA H100 GPUs with Technology Preview support
AI/ML engineers can run GPU-accelerated inference and training workloads within TEEs, protecting proprietary models and sensitive training data
Security engineers can enforce hardware-based GPU memory isolation and attestation policies for sensitive compute workloads
Data scientists can leverage GPU acceleration for confidential computing use cases without modifying existing containerized GPU applications
Compliance teams can demonstrate GPU workload protection meeting regulatory requirements for sensitive data processing

Expanded Features:

Extends existing Confidential Containers TEE capabilities to GPU-accelerated workloads
Integrates NVIDIA Confidential Computing features with OpenShift
Provides foundation for future GPU confidential computing enhancements based on Tech Preview feedback
Enables exploration of confidential AI/ML workload patterns on OpenShift

Requirements (aka. Acceptance Criteria)

Functional Requirements:

Support for NVIDIA H100 GPU chips with NVIDIA Confidential Computing capabilities on bare metal hardware
Exploratory support for NVIDIA B200 GPUs (subject to hardware availability for testing and validation)
GPU memory encryption during computation within the TEE
Attestation and verification of GPU TEE integrity before workload deployment
Integration with OpenShift NVIDIA Operator for GPU resource management
Support for GPU passthrough to confidential containers
RuntimeClass configuration for GPU-enabled confidential workloads

Non-Functional Requirements:

Security: Hardware-enforced GPU memory encryption, GPU attestation, protection of models and data during GPU computation, secure key management for GPU encryption
Usability: Clear Technology Preview limitations documentation, integration with existing GPU workflows, example configurations for common AI/ML frameworks
Supportability: Technology Preview support level with clear feedback channels, known limitations documented, troubleshooting guidance for GPU-specific scenarios

Documentation Considerations

Required Documentation:

Installation Guide:

Hardware prerequisites (specific NVIDIA H100 GPU models, supported server platforms)
NVIDIA driver and firmware requirements for Confidential Computing
NVIDIA GPU operator configuration alongside Confidential Containers operator
Hardware detection and validation procedures for GPU TEE capabilities

Administrator Guide:

Cluster configuration for GPU-enabled confidential computing
Troubleshooting GPU-specific TEE issues
Technology Preview limitations and workarounds

Developer Guide:

RuntimeClass configuration for GPU-enabled confidential containers
GPU attestation verification
Framework compatibility (TensorFlow, PyTorch, CUDA applications)
Migration path from standard GPU workloads to confidential GPU workloads

Architecture Documentation:

GPU confidential computing component overview
Integration between Confidential Containers and NVIDIA Confidential Computing
GPU memory encryption and attestation flow diagrams
Security model for GPU workloads in TEEs
Comparison with standard GPU workload architecture

Hardware Compatibility Matrix:

Supported NVIDIA H100 GPU models and configurations
Known hardware limitations and compatibility issues

Technology Preview Statement:

Explicit Technology Preview scope and limitations
Features not yet supported (if any)

Release Notes:

Technology Preview feature highlights
Supported GPU models (H100, B200 status)
Known limitations and issues
Compatibility with existing Confidential Containers deployments

relates to

RFE-7759 Confidential GPU support in RHEL 9.6

Refinement

KATA-4626 [Docs] Trustee-based Attestation for CoCo with NVIDIA GPU on Bare Metal (TP)

New

links to

openshift/sandboxed-containers-operator#1738: feat(runtimeclass): add NVIDIA GPU support for kata runtime classe

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates