-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17, 4.18, 4.19, 4.20, 4.21
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While doing lots of tests for new features for the RHCS terraform provider, we've been noticing a lot of failures where we can't use terraform to delete the VPC that we used for testing HCP clusters due to resources still depending on the VPC. When we try to manually delete them through the AWS console it shows that there's still a security group that seems to follow the naming pattern `${CLUSTER_ID}-vpce-private-router` (eg: 2o78lq0lmr849st1se7ove3pb629ujts-vpce-private-router)
It seems like the VPCs that had this issue on 01/30 and 01/31 last week (haven't investigated logs older than that yet) have since been cleaned up (likely just a weekend clean up job though), but I've found at least one VPC from 02/01 where the security group still hasn't been deleted yet (as of 02/02 5PM EST)
Version-Release number of selected component (if applicable):
I'm not fully sure what to put here, but I have seen this happen after creating clusters running both 4.18 and 4.20 OCP versions.
How reproducible:
~40%. Of the 31 tests that currently show up at https://prow.ci.openshift.org/?repo=terraform-redhat%2Fterraform-provider-rhcs, 12 of them seem to have failed with the error "The vpc '{vpc-id}' has dependencies and cannot be deleted". I have also run into this with both clusters I spun up using terraform on 01/30/2026
Steps to Reproduce:
1. Create a VPC
2. Create a HCP cluster in that VPC
3. Delete the cluster
4. Attempt to delete the VPC
My testing has only been with the RHCS terraform provider, so these steps are really just define everything in a terraform file, run "terraform apply" and "terraform destroy" a while later
Actual results:
Even after the vpc endpoint has been deleted, the private router (`${CLUSTER_ID}-vpce-private-router`) security group is still around and preventing the VPC from being deleted
Expected results:
All AWS resources that are linked to a BYO-VPC are removed shortly after the cluster has been fully deleted, and you can successfully delete the VPC
Additional info:
After talking to a few folks in #forum-rosa-eng (https://redhat-internal.slack.com/archives/C0A8S4L99C2/p1770065919408449) we believe there's a bug in the control-plane-operator code. I was using claude to help me narrow down whether the issue was with our code or not, and after pointing to the control-plane-operator code it seems to think the issue is that around here (https://github.com/openshift/hypershift/blob/a8694da4b07ef5cb809ad6d7438ae62f4d7dd3ac/control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go#L1023) the code deletes the vpc endpoint, and then immediately tries to delete the security group which supposedly can cause it to fail if AWS hasn't actually deleted the vpc endpoint yet