Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76337

Scale from zero fails because nodepool IAM policies lack ec2:DescribeInstanceTypes permission

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • Bug Fix
    • Hide
      Cause: The latest updates on the cluster-api-provider-aws added native fields for scaling information. Now instead of the nodepool-controller adding this information, the capi-provider is adding the information but it does not have permission to gather the data from EC2.
      Consequence: The scaling feature fails when capi-provider attempts to set the WSMachineTemplate.Status.Capacity field.
      Fix: The ec2:DescribeInstanceTypes permission has been added to the capi-provider to be able to gather the information and set the field correctly.
      Result: Scale from zero should now work the native way without issues once CAPI is updated in hypershift.
      Show
      Cause: The latest updates on the cluster-api-provider-aws added native fields for scaling information. Now instead of the nodepool-controller adding this information, the capi-provider is adding the information but it does not have permission to gather the data from EC2. Consequence: The scaling feature fails when capi-provider attempts to set the WSMachineTemplate.Status.Capacity field. Fix: The ec2:DescribeInstanceTypes permission has been added to the capi-provider to be able to gather the information and set the field correctly. Result: Scale from zero should now work the native way without issues once CAPI is updated in hypershift.
    • None
    • None
    • None
    • None

      Description of problem:

      Scale from zero does not work properly for ROSA HCP clusters because the nodepool IAM policies are missing the ec2:DescribeInstanceTypes permission.  For scale from zero to work correctly, the AWSMachineTemplate.Status.Capacity fields need to be populated with information from EC2 instances. Without the permission, this information cannot be retrieved.
      
      Previously, capacities were set by the hypershift operator (which already has this permission) as annotations.
      With the latest changes in CAPA, the AWSMachineTemplate.Status.Capacity field is reconciled by the provider instead, and therefore, it needs the same permission to gather the data from the EC2 instances.

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a ROSA HCP cluster with a nodepool configured for scale-to-zero.
      2. Scale the nodepool down to zero replicas.
      3. Trigger a scale-up event (e.g., deploy a workload that requires the nodepool).
      4. Observe that scale from zero fails or capacity information is not available. 

      Actual results:

      Error when trying to set the capacity:
      Failed to query capacity for instance type \"m5.large\": operation error EC2: DescribeInstanceTypes, https response error StatusCode: 403, RequestID: 4bf99097-c143-4f7d-806f-ea4712159e7a, api error UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:sts::820196288204:assumed-role/node-pool-xfbfv-node-pool/1770220345520145531 is not authorized to perform: ec2:DescribeInstanceTypes

      Expected results:

      Capacity properties can be set without permission errors when describing EC2 instances.

              rh-ee-bclement Borja Clemente Castanera
              rh-ee-bclement Borja Clemente Castanera
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: