Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44698

Shared VPC: AWS client fails to assume role when token creation is delayed

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.17.z, 4.18
    • HyperShift
    • None
    • Important
    • None
    • Hypershift Sprint 263
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          In integration, creating a rosa HostedCluster with a shared vpc will result in a VPC endpoint that is not available.

      Version-Release number of selected component (if applicable):

          4.17.3

      How reproducible:

          Sometimes (currently every time in integration, but could be due to timing)

      Steps to Reproduce:

          1. Create a HostedCluster with shared VPC
          2. Wait for HostedCluster to come up
          

      Actual results:

      VPC endpoint never gets created due to errors like:
      {"level":"error","ts":"2024-11-18T20:37:51Z","msg":"Reconciler error","controller":"awsendpointservice","controllerGroup":"hypershift.openshift.io","controllerKind":"AWSEndpointService","AWSEndpointService":{"name":"private-router","namespace":"ocm-int-2f4labdgi2grpumbq5ufdsfv7nv9ro4g-cse2etests-gdb"},"namespace":"ocm-int-2f4labdgi2grpumbq5ufdsfv7nv9ro4g-cse2etests-gdb","name":"private-router","reconcileID":"bc5d8a6c-c9ad-4fc8-8ead-6b6c161db097","error":"failed to create vpc endpoint: UnauthorizedOperation","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222"}
          

      Expected results:

          VPC endpoint gets created

      Additional info:

          Deleting the control plane operator pod will get things working. 
      The theory is that if the control plane operator pod is delayed in obtaining a web identity token, then the client will not assume the role that was passed to it.
      
      Currently the client is only created once at the start, we should create it on every reconcile.

            [OCPBUGS-44698] Shared VPC: AWS client fails to assume role when token creation is delayed

            Successfully created a shared VPC on HCP Cluster with everything available.

            Ohad Aharoni added a comment - Successfully created a shared VPC on HCP Cluster with everything available.

            Aadarsh Raj added a comment -

            rh-ee-oaharoni is the feature owner for shared vpc, so I have reassigned to him

            Aadarsh Raj added a comment - rh-ee-oaharoni is the feature owner for shared vpc, so I have reassigned to him

            Jie Zhao added a comment -

            Hi rh-ee-aaraj , I've reassigned the QA contact to you since you are testing rosa-hcp with the shared vpc, but you can reassign it to someone else who should test it. Thanks!

            Jie Zhao added a comment - Hi rh-ee-aaraj , I've reassigned the QA contact to you since you are testing rosa-hcp with the shared vpc, but you can reassign it to someone else who should test it. Thanks!

            Ying Zhang added a comment -

            rhn-support-jiezhao cewong@redhat.com It doesn't support to create hcp with shared vpc via rosa cli.

            And ocmqe-aaraj is one of HCP feature owner who test HCP.

            Ying Zhang added a comment - rhn-support-jiezhao cewong@redhat.com It doesn't support to create hcp with shared vpc via rosa cli. And ocmqe-aaraj is one of HCP feature owner who test HCP.

            Hi cewong@redhat.com,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi cewong@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

              cewong@redhat.com Cesar Wong
              cewong.openshift Cesar Wong (Inactive)
              Ohad Aharoni Ohad Aharoni
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: