Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17, 4.18, 4.19, 4.20, 4.21
Component/s: HyperShift / ROSA
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    While doing lots of tests for new features for the RHCS terraform provider, we've been noticing a lot of failures where we can't use terraform to delete the VPC that we used for testing HCP clusters due to resources still depending on the VPC. When we try to manually delete them through the AWS console it shows that there's still a security group that seems to follow the naming pattern `${CLUSTER_ID}-vpce-private-router` (eg: 2o78lq0lmr849st1se7ove3pb629ujts-vpce-private-router)

It seems like the VPCs that had this issue on 01/30 and 01/31 last week (haven't investigated logs older than that yet) have since been cleaned up (likely just a weekend clean up job though), but I've found at least one VPC from 02/01 where the security group still hasn't been deleted yet (as of 02/02 5PM EST)

Version-Release number of selected component (if applicable):

    I'm not fully sure what to put here, but I have seen this happen after creating clusters running both 4.18 and 4.20 OCP versions.

How reproducible:

    ~40%. Of the 31 tests that currently show up at https://prow.ci.openshift.org/?repo=terraform-redhat%2Fterraform-provider-rhcs, 12 of them seem to have failed with the error "The vpc '{vpc-id}' has dependencies and cannot be deleted". I have also run into this with both clusters I spun up using terraform on 01/30/2026

Steps to Reproduce:

    1. Create a VPC
    2. Create a HCP cluster in that VPC
    3. Delete the cluster
    4. Attempt to delete the VPC

My testing has only been with the RHCS terraform provider, so these steps are really just define everything in a terraform file, run "terraform apply" and "terraform destroy" a while later

Actual results:

    Even after the vpc endpoint has been deleted, the private router (`${CLUSTER_ID}-vpce-private-router`) security group is still around and preventing the VPC from being deleted

Expected results:

    All AWS resources that are linked to a BYO-VPC are removed shortly after the cluster has been fully deleted, and you can successfully delete the VPC

Additional info:

    After talking to a few folks in #forum-rosa-eng (https://redhat-internal.slack.com/archives/C0A8S4L99C2/p1770065919408449) we believe there's a bug in the control-plane-operator code.

I was using claude to help me narrow down whether the issue was with our code or not, and after pointing to the control-plane-operator code it seems to think the issue is that around here (https://github.com/openshift/hypershift/blob/a8694da4b07ef5cb809ad6d7438ae62f4d7dd3ac/control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go#L1023) the code deletes the vpc endpoint, and then immediately tries to delete the security group which supposedly can cause it to fail if AWS hasn't actually deleted the vpc endpoint yet

Assignee:: Salvatore Dario Minonne

Reporter:: Jericho Keyne

Need Info From:: None

Contributors:: None

QA Contact:: Jie Zhao

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2026/02/02 10:14 PM

Updated:: 2026/02/11 1:19 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates