Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-2428

AWS install failures due to rate limiting with IAM roles

XMLWordPrintable

    • Icon: Ticket Ticket
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None

      We are seeing install issues across many different AWS jobs, payloads and otherwise. The failures look like:

      level=info msg=Creating infrastructure resources...
      level=info msg=Reconciling IAM roles for control-plane and compute nodes
      level=info msg=Creating IAM role for master
      level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed during pre-provisioning: failed to create IAM roles: failed to get master instance profile: operation error IAM: GetInstanceProfile, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: 03248d48-8f48-45c9-a876-1920b143c4e7, api error Throttling: Rate exceeded
      Installer exit with code 4
      

      There is some discussion about this here, as well as here.

      This is resultant of the AWS SDK v2 client (which is also in use in 4.20). This has been in use for quite awhile now, and we are not sure why all of the sudden it is failing so frequently. Speculation is that it is due to frequency of jobs in the accounts.

      This means that there really isn't anything to revert here. The potential fix is: https://github.com/openshift/installer/pull/10112

              Unassigned Unassigned
              sgoeddel@redhat.com Stephen Goeddel
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: