-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
rhos-18.0.10 FR 3
-
None
-
True
-
-
False
-
?
-
rhos-workloads-compute
-
None
-
-
-
-
Moderate
When removing and then redeploying a compute node, the process fails due to a pymysql.err.IntegrityError: (1062, "Duplicate entry ...") in the Nova database.
This issue occurs because the nova-compute service deletion succeeds even when there are instances still present on the host. The service deletion API call removes the host mapping and the service and resource provider records from Placement, but it does not delete the compute_node object from the database. This leaves an orphaned compute_nodes record.
When the compute node is subsequently reprovisioned, Nova attempts to create a new compute_nodes record, which conflicts with the orphaned record and violates the database's unique constraint, leading to the "Duplicate entry" error.
To Reproduce Steps to reproduce the behavior:
- Deploy a compute node and launch an instance on it.
- Disable and delete the nova-compute service for that node.
- Observe that the service deletion succeeds, despite the presence of an instance.
- Attempt to redeploy the same compute node.
Expected behavior
- The service deletion should fail if there are instances on the host, as per the check in nova/api/openstack/compute/services.py:
https://github.com/openstack/nova/blob/8b81b5f91ffe1f9c38a483d151b82316d443dbf6/nova/api/openstack/compute/services.py#L268-L274
Screenshots
Device Info (please complete the following information):
Bug impact
- This is a bug in the service deletion logic. The check for existing instances is not functioning as expected, which leads to an inconsistent state in the Nova database and prevents the successful redeployment of compute nodes. This creates a significant operational issue for anyone needing to perform maintenance or hardware replacement on compute nodes.
Known workaround
- Manually delete the orphaned compute_nodes record from the database?
Additional context
- <your text here>
- ...
- is related to
-
OSPRH-20811 Documentation bug: Incomplete procedure for removing compute nodes causes database integrity errors on redeployment.
-
- Closed
-
- links to