-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19
Description of problem:
From the customer perspective, citing: "the process to destroy a cluster does not complete, hanging up during the AWS resource clean up. The process get stuck in a loop attempting to delete AWS Backup Snapshots. The issue is related to how the AWS Backup is configured to not allow the deletions of backup snaps." It happens when AWS backup policies and snapshots are running in a cluster where it is tried to be destroyed.
From the customer: It has global backup policies in AWS backup vault service defined in all their AWS accounts which takes a snapshot of all EBS volumes for all workloads running in the account. The snapshot of the EBS volumes by the AWS backup vault service duplicates all the tag details from the EBS volume to the backup snapshot. - So when the customer initiates to destroy a cluster, the installer will get stuck on deleting the snapshot (since it's tagged with the "kubernetes.io/<clustername-infra> = owned" tag) because the snapshot is still unexpired and being held. - Customer is requesting that we have our installer have a flag to ignore any protected snapshot resource (created by AWS backup service), and proceed to just delete the EBS volume. - The workaround is for the customer to remove the "kubernetes.io/<clustername-infra> = owned" tag on the volume snapshots so that the cluster deletion can complete.
Version-Release number of selected component (if applicable):
4.19
How reproducible:
Destroy a cluster that has active AWS snapshots with AWS backup policies for EC2.
Actual results:
Cluster destroy does not complete.
Expected results:
Cluster destroy to complete.
Additional info:
Current workaround would be deleting manually the AWS snapshots or to modify the respective tag. Ideally this could be ignored as then those would be managed by AWS and would not be left orphan as they are managed by respective provider and has expire dates.