-
Feature
-
Resolution: Unresolved
-
Critical
-
None
-
None
Migrate HyperShift's AWS SDK dependencies from v1 to v2 to ensure continued security support, vendor maintenance, and compliance before the July 31, 2025 end-of-life deadline.
Market Problem
AWS is ending support for AWS SDK for Go v1 on July 31, 2025. After this date, SDK v1 will no longer receive:
- Security patches for discovered vulnerabilities
- Bug fixes for defects
- Updates for new AWS services or features
- Vendor support from AWS
HyperShift currently depends on AWS SDK v1 across 36 production code files representing approximately ~10,000 lines of AWS-dependent code. This creates critical risks:
Who is affected:
- ROSA HCP (Red Hat OpenShift Service on AWS - Hosted Control Planes) customers
- ARO HCP (Azure Red Hat OpenShift - Hosted Control Planes) customers using AWS dependencies
- Self-managed HyperShift deployments on AWS
Impact of not migrating:
- Security risk:* Post EOL vulnerabilities in AWS SDK v1
- Compliance risk:* Security audits will flag unsupported dependencies
- Technical debt:* Migration becomes increasingly costly as codebase evolves
- Support burden:* AWS will not address SDK v1 issues after EOL
This is a time-critical engineering investment required to maintain security posture and vendor support for all AWS-based hosted control plane offerings.
Proposed Solution
Complete migration of HyperShift's AWS SDK usage from v1 to v2 before the July 31, 2025 EOL deadline, including:
Core Migration:
- Migrate all 36 production code files using AWS SDK v1 to v2 equivalents
- Update configuration patterns from session-based to context-based
- Modernize service clients to use v2 modular design
- Refactor error handling to use v2 error types (smithy.APIError)
- Update credential providers to v2 interfaces
- Migrate paginators and waiters to v2 explicit constructors
- Update testing infrastructure and mocks to v2 patterns
Dependency Updates:
- Update Cluster API Provider AWS (CAPA) to v2-compatible version
- Verify controller-runtime AWS SDK compatibility
Quality Assurance:
- Comprehensive unit test updates across all affected components
- Full E2E test validation on AWS platform
- Performance benchmarking to ensure no regressions
- Security scanning of new SDK version
Strategic Value
Customer Value
- Security assurance:* Continued security patching protects customer infrastructure
- Compliance:* Meets security audit requirements for supported dependencies
- Reliability:* Access to AWS bug fixes maintains cluster stability
- Innovation:* Enables future use of new AWS services and features
Business Impact
- Risk mitigation:* Prevents security incidents from unpatched SDK vulnerabilities
- Compliance:* Maintains SOC2, ISO27001, FedRAMP compliance postures
- Support efficiency:* AWS support available for SDK-related issues
- Cost avoidance:* Prevents emergency migration costs if security issue discovered post-EOL
Success Criteria
Completion
- Zero AWS SDK v1 imports remain in HyperShift codebase
- All CAPA dependencies updated to v2-compatible versions
- Migration completed and shipped before July 31, 2025
Quality
- 100% unit test pass rate with AWS SDK v2
- 100% E2E test pass rate for AWS platform (ROSA HCP, standalone)
- Zero performance regression in AWS operations (EC2, S3, Route53, etc.)
- Zero functional regressions reported in QE or production
Adoption
- Target OpenShift release (4.20 or earlier) ships with migrated code
- All customer deployments receive migrated version through normal upgrade path
- Documentation updated with v2 usage patterns
Security
- Security scan shows zero critical/high vulnerabilities in AWS SDK dependency
- Compliance teams confirm supported dependency status
- No security incidents related to AWS SDK dependencies post-migration
Scope
Affected Components (36 files)
Critical Production Components:
1. CLI Infrastructure Management (cmd/infra/aws/)
- Infrastructure creation/destruction workflows
- AWS services: EC2, Route53, S3, IAM, ELB/ELBv2, RAM, STS
- Components: VPC/subnet/security group creation, NAT gateways, DNS zone management, IAM role/policy provisioning, load balancer setup
- Impact: Core hypershift CLI commands for AWS infrastructure provisioning
2. HyperShift Operator - AWS Platform Controller
- hypershift-operator/controllers/platform/aws/ - AWS Endpoint Service management
- hypershift-operator/controllers/hostedcluster/ - S3 OIDC bucket operations
- hypershift-operator/controllers/nodepool/ - EC2 subnet validation and instance metrics
- AWS services: EC2, S3, ELBv2
- Impact: Core reconciliation logic for AWS-hosted clusters (ROSA HCP, standalone AWS)
3. Control Plane Operator - Private Link Controller
- control-plane-operator/controllers/awsprivatelink/ - VPC Endpoint Service management
- control-plane-operator/controllers/awsprivatelink/route53.go - Private DNS integration
- AWS services: EC2, Route53, KMS
- Impact: Critical for Private cluster functionality - affects PrivateLink-based customer deployments
4. Shared AWS Utilities (support/awsutil/)
- STS role assumption with web identity federation
- AWS error code handling and retry logic
- Security group operations
- Impact: Foundation utilities used across all AWS components
5. Karpenter Operator - Machine Approver
- karpenter-operator/controllers/karpenter/machine_approver.go
- EC2 instance verification for auto-scaling
- Impact: NodePool auto-scaling with Karpenter
6. Etcd Backup Service
- etcd-backup/etcdbackup.go - S3 snapshot uploads
- Impact: Cluster backup and disaster recovery
7. CLI Utilities
- Bastion host creation/destruction
- Console log retrieval from EC2 instances
- Impact: Debugging and troubleshooting workflows
8. E2E Test Infrastructure (9 files)
- AWS resource creation/cleanup for tests
- Shared OIDC provider setup
- KMS encryption validation, tag propagation testing
- Impact: CI/CD test reliability and coverage
9. Contrib Tools
- Zone cleanup utilities
- Impact: Operations and maintenance scripts
AWS Services Utilized (7 services)
- EC2 (16 files) - Instance management, VPC/networking, security groups, volumes
- Route53 (6 files) - DNS zones, record sets, private hosted zones
- S3 (5 files) - OIDC provider buckets, etcd backup storage, multipart uploads
- IAM (4 files) - Roles, policies, OIDC identity providers
- ELBv2 (4 files) - Application/Network Load Balancers, Endpoint Services
- ELB Classic (2 files) - Legacy load balancer operations
- Additional: KMS (encryption), RAM (shared VPC), STS (role assumption), Resource Groups Tagging API
Technical Patterns that might require migration
- Session Management:* aws/session package usage across all components
- Credentials:* Static credentials, STS web identity, role assumption patterns
- Error Handling:* awserr package for error code extraction and retry logic
- Testing Interfaces: Heavy use of _iface packages (ec2iface, s3iface, etc.) for mocking
- Request Customization:* Custom user agent headers ("openshift.io/hypershift"), retry handlers
- S3 Multipart:* s3/s3manager for efficient large file uploads
Epics (Planned)
Phase 1: Test Infrastructure & CLI tools
- Epic 1: Migrate E2E test infrastructure to validate patterns and create examples
- Epic 2: Migrate CLI infrastructure commands (cmd/infra/aws/) - largest code surface
Phase 2: Operators & supporting components
- Epic 3: Migrate HyperShift operator AWS controllers
- Epic 4: Migrate Control Plane operator AWS controllers (PrivateLink, Route53)
- Epic 5: etcd-backup, contrib tools, aws-encryption-provider
Phase 3: Docs
- Epic 6: Update dev documentation, migration guides, developer onboarding materials
Out of Scope
- Feature additions beyond migration requirements
- Performance optimizations not directly related to SDK migration
- Changes to non-AWS platform support (Azure, PowerVS, KubeVirt, OpenStack, Agent)
- Refactoring of AWS infrastructure management patterns (keep existing patterns, just update SDK)
Dependencies
| Dependency | SDKv1 | SDKv2 | Status | Impact |
| CAPA | ❌ None | ✅ Fully migrated | ✅ Complete | No blocker - can use as reference |
| Karpenter | ❌ None | ✅ Fully migrated | ✅ Complete | No blocker - can use as reference |
| controller-runtime | ❌ None | ❌ None | ✅ N/A | No impact - cloud-agnostic |
| HyperShift | ⚠️ v1.55.7 | Indirect only | 🔄 To migrate | Primary work needed |
Hard Deadline:
- July 31, 2025: AWS SDK v1 end-of-life - MUST ship before this date
References
- AWS SDK v1 EOL Announcement: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-go-v1-on-july-31-2025/
- AWS SDK v2 Migration Guide: https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/migrate-gosdk.html
- CAPA Upstream Issue: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2225
- Original Epic: CNTRLPLANE-886