-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When destroying an agent cluster, problems deprovisioning the agents can prevent the cluster from deleting. For example, today we encountered an issue in which the DNS records for a cluster were deleted at the same time as the cluster, leading to this situation:
$ k -n hardware-inventory get agent 2f25a998-0f1d-c202-4fdd-a2c300c9b7da -o json | jq .status.deprovision_info
{
"cluster_name": "london",
"cluster_namespace": "clusters-london",
"message": "failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://api.london.int.massopen.cloud:30894/api/v1\": dial tcp: lookup api.london.int.massopen.cloud on 172.30.0.10:53: no such host",
"node_name": "moc-r4pac24u35-s1c"
}
This information isn't particularly discoverable without a fair amount of a priori knowledge about hosted control planes. In particular:
- There is no indication of a problem in the output of `kubectl get agents`.
- There is no indication of a problem in the .status attribute of either the HostedCluster resource or the HostedControlPlane resource.
Version-Release number of selected component (if applicable):
We're running ACM 2.12.2
How reproducible:
Delete the DNS records for a cluster, and then attempt to delete the cluster.
Actual results:
Expected results:
Issues preventing a cluster from deleting should be surfaced to the cluster administrator in a more obvious fashion.
Additional info: