-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18, 4.19, 4.20
Description of problem:
In the current implementation of "machine-api-provider-openstack", it uses the module "cluster-api-provider-openstack" with the old version "v0.9.1" to delete an instance. In this old version, "cluster-api-provider-openstack" deletes the network ports BEFORE deleting the instance:
https://github.com/openshift/machine-api-provider-openstack/blob/main/vendor/sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute/instance.go#L662
However, the port deletion fails when it is a primary NIC. Most OpenStack cloud providers set the policy which prevents the deletion of a primary NIC of an instance:
E1112 16:12:21.618496 1 controller.go:279] openshift-5mj7s-worker-z46h8: failed to delete machine: Request forbidden: [DELETE https://ecs.<cloud_provider_domain>/v2.1/2d8eb6c7cac74059a213aea02fc698bc/servers/d5c1d699-5611-4962-86b7-f4ffa110545c/os-interface/39801849-eb0c-46ea-a0e4-570e838a1c83], error message: {"forbidden":{"message":"Policy doesn't allow compute:detach primary port to be performed.","code":"403"}}
Starting from version "0.10.0", "cluster-api-provider-openstack" has moved the port deletion AFTER the instance deletion in this commit:
https://github.com/kubernetes-sigs/cluster-api-provider-openstack/commit/7ec4c14aa254748e298a59d1623d700fccf39519
Posible solution: Should we upgrade "cluster-api-provider-openstack" to "0.10" or use "0.9.0" but move the port deletion logic after the instance deletion?
Version-Release number of selected component (if applicable):
main
How reproducible:
Reproduce on OpenStack cloud providers with policy set to prevent a deletion of primary NIC.
Steps to Reproduce:
1. Create a MachineSet successfully.
2. Delete the given MachineSet.
Actual results:
Failed to delete the machine with error above
Expected results:
Machine is deleted successfully
Additional info: