Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2698

Migrate HyperShift to AWS SDK for Go v2

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • 75% To Do, 13% In Progress, 13% Done
    • Hide
      • Color Status: Green
      • Status summary:
        • PR #7386 (STS migration to AWS SDK v2) actively under review with 11 reviews
        • Updated Jan 27 - making progress toward merge
        • CNTRLPLANE-2056 breakdown to multiple stories in progress
      • Risks:
        • AWS SDK v1 EOL deadline is July 31, 2026 - timeline requires close monitoring
      Show
      Color Status: Green Status summary: PR #7386 (STS migration to AWS SDK v2) actively under review with 11 reviews Updated Jan 27 - making progress toward merge CNTRLPLANE-2056 breakdown to multiple stories in progress Risks: AWS SDK v1 EOL deadline is July 31, 2026 - timeline requires close monitoring
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • 7
    • None
    • None
    • None
    • None
    • None
    • None

      Migrate HyperShift's AWS SDK dependencies from v1 to v2 to ensure continued security support, vendor maintenance, and compliance before the July 31, 2025 end-of-life deadline.

      Market Problem

      AWS is ending support for AWS SDK for Go v1 on July 31, 2025. After this date, SDK v1 will no longer receive:

      • Security patches for discovered vulnerabilities
      • Bug fixes for defects
      • Updates for new AWS services or features
      • Vendor support from AWS

      HyperShift currently depends on AWS SDK v1 across 36 production code files representing approximately ~10,000 lines of AWS-dependent code. This creates critical risks:

      Who is affected:

      • ROSA HCP (Red Hat OpenShift Service on AWS - Hosted Control Planes) customers
      • ARO HCP (Azure Red Hat OpenShift - Hosted Control Planes) customers using AWS dependencies
      • Self-managed HyperShift deployments on AWS

      Impact of not migrating:

      • Security risk:* Post EOL vulnerabilities in AWS SDK v1
      • Compliance risk:* Security audits will flag unsupported dependencies
      • Technical debt:* Migration becomes increasingly costly as codebase evolves
      • Support burden:* AWS will not address SDK v1 issues after EOL

      This is a time-critical engineering investment required to maintain security posture and vendor support for all AWS-based hosted control plane offerings.

      Proposed Solution

      Complete migration of HyperShift's AWS SDK usage from v1 to v2 before the July 31, 2025 EOL deadline, including:

      Core Migration:

      • Migrate all 36 production code files using AWS SDK v1 to v2 equivalents
      • Update configuration patterns from session-based to context-based
      • Modernize service clients to use v2 modular design
      • Refactor error handling to use v2 error types (smithy.APIError)
      • Update credential providers to v2 interfaces
      • Migrate paginators and waiters to v2 explicit constructors
      • Update testing infrastructure and mocks to v2 patterns

      Dependency Updates:

      • Update Cluster API Provider AWS (CAPA) to v2-compatible version
      • Verify controller-runtime AWS SDK compatibility

      Quality Assurance:

      • Comprehensive unit test updates across all affected components
      • Full E2E test validation on AWS platform
      • Performance benchmarking to ensure no regressions
      • Security scanning of new SDK version

      Strategic Value

      Customer Value

      • Security assurance:* Continued security patching protects customer infrastructure
      • Compliance:* Meets security audit requirements for supported dependencies
      • Reliability:* Access to AWS bug fixes maintains cluster stability
      • Innovation:* Enables future use of new AWS services and features

      Business Impact

      • Risk mitigation:* Prevents security incidents from unpatched SDK vulnerabilities
      • Compliance:* Maintains SOC2, ISO27001, FedRAMP compliance postures
      • Support efficiency:* AWS support available for SDK-related issues
      • Cost avoidance:* Prevents emergency migration costs if security issue discovered post-EOL

      Success Criteria

      Completion

      • Zero AWS SDK v1 imports remain in HyperShift codebase
      • All CAPA dependencies updated to v2-compatible versions
      • Migration completed and shipped before July 31, 2025

      Quality

      • 100% unit test pass rate with AWS SDK v2
      • 100% E2E test pass rate for AWS platform (ROSA HCP, standalone)
      • Zero performance regression in AWS operations (EC2, S3, Route53, etc.)
      • Zero functional regressions reported in QE or production

      Adoption

      • Target OpenShift release (4.20 or earlier) ships with migrated code
      • All customer deployments receive migrated version through normal upgrade path
      • Documentation updated with v2 usage patterns

      Security

      • Security scan shows zero critical/high vulnerabilities in AWS SDK dependency
      • Compliance teams confirm supported dependency status
      • No security incidents related to AWS SDK dependencies post-migration

      Scope

      Affected Components (36 files)

      Critical Production Components:

      1. CLI Infrastructure Management (cmd/infra/aws/)

      • Infrastructure creation/destruction workflows
      • AWS services: EC2, Route53, S3, IAM, ELB/ELBv2, RAM, STS
      • Components: VPC/subnet/security group creation, NAT gateways, DNS zone management, IAM role/policy provisioning, load balancer setup
      • Impact: Core hypershift CLI commands for AWS infrastructure provisioning

      2. HyperShift Operator - AWS Platform Controller

      • hypershift-operator/controllers/platform/aws/ - AWS Endpoint Service management
      • hypershift-operator/controllers/hostedcluster/ - S3 OIDC bucket operations
      • hypershift-operator/controllers/nodepool/ - EC2 subnet validation and instance metrics
      • AWS services: EC2, S3, ELBv2
      • Impact: Core reconciliation logic for AWS-hosted clusters (ROSA HCP, standalone AWS)

      3. Control Plane Operator - Private Link Controller

      • control-plane-operator/controllers/awsprivatelink/ - VPC Endpoint Service management
      • control-plane-operator/controllers/awsprivatelink/route53.go - Private DNS integration
      • AWS services: EC2, Route53, KMS
      • Impact: Critical for Private cluster functionality - affects PrivateLink-based customer deployments

      4. Shared AWS Utilities (support/awsutil/)

      • STS role assumption with web identity federation
      • AWS error code handling and retry logic
      • Security group operations
      • Impact: Foundation utilities used across all AWS components

      5. Karpenter Operator - Machine Approver

      • karpenter-operator/controllers/karpenter/machine_approver.go
      • EC2 instance verification for auto-scaling
      • Impact: NodePool auto-scaling with Karpenter

      6. Etcd Backup Service

      • etcd-backup/etcdbackup.go - S3 snapshot uploads
      • Impact: Cluster backup and disaster recovery

      7. CLI Utilities

      • Bastion host creation/destruction
      • Console log retrieval from EC2 instances
      • Impact: Debugging and troubleshooting workflows

      8. E2E Test Infrastructure (9 files)

      • AWS resource creation/cleanup for tests
      • Shared OIDC provider setup
      • KMS encryption validation, tag propagation testing
      • Impact: CI/CD test reliability and coverage

      9. Contrib Tools

      • Zone cleanup utilities
      • Impact: Operations and maintenance scripts

      AWS Services Utilized (7 services)

      • EC2 (16 files) - Instance management, VPC/networking, security groups, volumes
      • Route53 (6 files) - DNS zones, record sets, private hosted zones
      • S3 (5 files) - OIDC provider buckets, etcd backup storage, multipart uploads
      • IAM (4 files) - Roles, policies, OIDC identity providers
      • ELBv2 (4 files) - Application/Network Load Balancers, Endpoint Services
      • ELB Classic (2 files) - Legacy load balancer operations
      • Additional: KMS (encryption), RAM (shared VPC), STS (role assumption), Resource Groups Tagging API

      Technical Patterns that might require migration

      • Session Management:* aws/session package usage across all components
      • Credentials:* Static credentials, STS web identity, role assumption patterns
      • Error Handling:* awserr package for error code extraction and retry logic
      • Testing Interfaces: Heavy use of _iface packages (ec2iface, s3iface, etc.) for mocking
      • Request Customization:* Custom user agent headers ("openshift.io/hypershift"), retry handlers
      • S3 Multipart:* s3/s3manager for efficient large file uploads

      Epics (Planned)

      Phase 1: Test Infrastructure & CLI tools

      • Epic 1: Migrate E2E test infrastructure to validate patterns and create examples
      • Epic 2: Migrate CLI infrastructure commands (cmd/infra/aws/) - largest code surface

      Phase 2: Operators & supporting components

      • Epic 3: Migrate HyperShift operator AWS controllers
      • Epic 4: Migrate Control Plane operator AWS controllers (PrivateLink, Route53)
      • Epic 5: etcd-backup, contrib tools, aws-encryption-provider

      Phase 3: Docs

      • Epic 6: Update dev documentation, migration guides, developer onboarding materials

      Out of Scope

      • Feature additions beyond migration requirements
      • Performance optimizations not directly related to SDK migration
      • Changes to non-AWS platform support (Azure, PowerVS, KubeVirt, OpenStack, Agent)
      • Refactoring of AWS infrastructure management patterns (keep existing patterns, just update SDK)

      Dependencies

       

      Dependency SDKv1 SDKv2 Status Impact
      CAPA ❌ None ✅ Fully migrated ✅ Complete No blocker - can use as reference
      Karpenter ❌ None ✅ Fully migrated ✅ Complete No blocker - can use as reference
      controller-runtime ❌ None ❌ None ✅ N/A No impact - cloud-agnostic
      HyperShift ⚠️ v1.55.7 Indirect only 🔄 To migrate Primary work needed

       

      Hard Deadline:

      • July 31, 2025: AWS SDK v1 end-of-life - MUST ship before this date

      References

              rhn-support-yli2 Yu Li
              asegurap1@redhat.com Antoni Segura Puimedon
              None
              Juan Manuel Parrilla Madrid, Liangquan Li, Zheng Feng
              Salvatore Dario Minonne Salvatore Dario Minonne
              Liangquan Li Liangquan Li
              Zheng Feng Zheng Feng
              Matthew Werner Matthew Werner
              Kyle Walker Kyle Walker
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: