-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.20.z
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
During the provisioning of ROSA cluster with AWS PrivateLink enabled, if the installation fails, the subsequent automatic cleanup process becomes stuck, this prevents the cluster from being de-provisioned and blocks new installation attempts on the same subnets.
The root cause is an IAM permission gap, the `Installer-Role` lacks the elasticloadbalancing:DeleteListener permission. When the installer attempts to clean up Network Load Balancer (NLB) resources after a failure, it receives an AccessDenied error, leaving listeners, target groups, and associated VPC Endpoint Services as orphaned resources. This results in dependency violations (ResourceInUse) during subsequent cleanup or installation attempts.
Version-Release number of selected component (if applicable):
The permission definition is version-agnostic and likely affects all STS-based installations using the standard IAM policies.
How reproducible:
Reproducible when the following conditions are met:
1. Cluster is configured with AWS PrivateLink (aws_private_link = true).
2. Installation fails before completion.
3. The automated post-failure cleanup process is triggered.
Steps to Reproduce:
1. Create an STS-based OpenShift cluster in a private VPC with aws_private_link = true.
2. Induce an installation failure (e.g., provide invalid machine CIDR, cause a bootstrap timeout).
3. Observe the installation logs. The failure will be followed by logs indicating "cleaning up resources from previous provision attempt".
4. The logs will show repeated AccessDenied: ... is not authorized to perform: elasticloadbalancing:DeleteListener ... errors.
5. The cleanup will hang, and AWS resources (NLB listeners, target groups, VPC Endpoint Services) will remain orphaned. Any new cluster creation in the same subnets may fail with ResourceInUse or generic Egress Inflight Check failures.
Actual results:
- Cluster installation fails and cannot be automatically cleaned up.
- Orphaned AWS resources block subsequent operations.
- Installation logs show: AccessDenied: User: ... assumed-role/...-Installer-Role/... is not authorized to perform: elasticloadbalancing:DeleteListener
Expected results:
The Installer Role should have all necessary permissions to clean up every AWS resource it creates, including NLB Listeners.
Upon installation failure, the deprovisioning process should complete fully and automatically, leaving no orphaned resources.
Subsequent installation attempts should proceed without conflicts from previous failures.
Additional info:
The missing permission can be traced to the openshift/installer repository. In pkg/asset/installconfig/aws/permissions.go, the PermissionDeleteBase group includes DeleteLoadBalancer and DeleteTargetGroup but omits DeleteListener.