Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45184

Shared VPC: AWS client fails to assume role when token creation is delayed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.17.z, 4.18
    • HyperShift
    • Important
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when you created a hosted cluster in a shared VPC, the private link controller sometimes failed to assume the shared VPC role to manage the VPC endpoints in the shared VPC. With this release, a client is created for every reconciliation in the private link controller so that you can recover from invalid clients. As a result, the hosted cluster endpoints and the hosted cluster are created successfully. (link:https://issues.redhat.com/browse/OCPBUGS-45184[*OCPBUGS-45184*])
      Show
      * Previously, when you created a hosted cluster in a shared VPC, the private link controller sometimes failed to assume the shared VPC role to manage the VPC endpoints in the shared VPC. With this release, a client is created for every reconciliation in the private link controller so that you can recover from invalid clients. As a result, the hosted cluster endpoints and the hosted cluster are created successfully. (link: https://issues.redhat.com/browse/OCPBUGS-45184 [* OCPBUGS-45184 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-44698. The following is the description of the original issue:

      Description of problem:

          In integration, creating a rosa HostedCluster with a shared vpc will result in a VPC endpoint that is not available.

      Version-Release number of selected component (if applicable):

          4.17.3

      How reproducible:

          Sometimes (currently every time in integration, but could be due to timing)

      Steps to Reproduce:

          1. Create a HostedCluster with shared VPC
          2. Wait for HostedCluster to come up
          

      Actual results:

      VPC endpoint never gets created due to errors like:
      {"level":"error","ts":"2024-11-18T20:37:51Z","msg":"Reconciler error","controller":"awsendpointservice","controllerGroup":"hypershift.openshift.io","controllerKind":"AWSEndpointService","AWSEndpointService":{"name":"private-router","namespace":"ocm-int-2f4labdgi2grpumbq5ufdsfv7nv9ro4g-cse2etests-gdb"},"namespace":"ocm-int-2f4labdgi2grpumbq5ufdsfv7nv9ro4g-cse2etests-gdb","name":"private-router","reconcileID":"bc5d8a6c-c9ad-4fc8-8ead-6b6c161db097","error":"failed to create vpc endpoint: UnauthorizedOperation","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222"}
          

      Expected results:

          VPC endpoint gets created

      Additional info:

          Deleting the control plane operator pod will get things working. 
      The theory is that if the control plane operator pod is delayed in obtaining a web identity token, then the client will not assume the role that was passed to it.
      
      Currently the client is only created once at the start, we should create it on every reconcile.

              cewong@redhat.com Cesar Wong
              openshift-crt-jira-prow OpenShift Prow Bot
              Ohad Aharoni Ohad Aharoni (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: