Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44698

Shared VPC: AWS client fails to assume role when token creation is delayed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.17.z, 4.18
    • HyperShift
    • None
    • Important
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          In integration, creating a rosa HostedCluster with a shared vpc will result in a VPC endpoint that is not available.

      Version-Release number of selected component (if applicable):

          4.17.3

      How reproducible:

          Sometimes (currently every time in integration, but could be due to timing)

      Steps to Reproduce:

          1. Create a HostedCluster with shared VPC
          2. Wait for HostedCluster to come up
          

      Actual results:

      VPC endpoint never gets created due to errors like:
      {"level":"error","ts":"2024-11-18T20:37:51Z","msg":"Reconciler error","controller":"awsendpointservice","controllerGroup":"hypershift.openshift.io","controllerKind":"AWSEndpointService","AWSEndpointService":{"name":"private-router","namespace":"ocm-int-2f4labdgi2grpumbq5ufdsfv7nv9ro4g-cse2etests-gdb"},"namespace":"ocm-int-2f4labdgi2grpumbq5ufdsfv7nv9ro4g-cse2etests-gdb","name":"private-router","reconcileID":"bc5d8a6c-c9ad-4fc8-8ead-6b6c161db097","error":"failed to create vpc endpoint: UnauthorizedOperation","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222"}
          

      Expected results:

          VPC endpoint gets created

      Additional info:

          Deleting the control plane operator pod will get things working. 
      The theory is that if the control plane operator pod is delayed in obtaining a web identity token, then the client will not assume the role that was passed to it.
      
      Currently the client is only created once at the start, we should create it on every reconcile.

              agarcial@redhat.com Alberto Garcia Lamela
              cewong.openshift Cesar Wong
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: