-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhos-18.0 FR 1 (Nov 2024)
-
None
-
1
-
False
-
-
False
-
?
-
None
-
-
-
Compute Next Sprint Candidates
-
1
-
Moderate
This was noticed during a cell deletion related code review.
Here the cell deletion logic starts the job that deletes the cell mapping from the cell0 DB but never waits for the job to finish and moves forward to delete the NovaCell CR representing the cell and also removes it from Nova.status.RegisteredCells list.
I foresee that this allows a race window to exist in the following scenario:
- user deletes cell2
- nova-operator starts the cell2 cell mapping deletion job and deletes NovaCell/cell2
- the job is slow to schedule to a worker or slow to run due to cell0 DB slowness
- user decides to (re)create cell2 as a new cell. (Maybe the user deleted cell2 as it failed somehow and wants to re-try the cell creation by deleting and re-creating it)
- nova-operator creates the new NovaCell/cell2 and eventually starts the cell mapping job.
- Now the cell mapping deletion job and the cell mapping job for the same cellname (cell2) runs in parallel and if the cell mapping job runs first then that see and updates the existing mapping, then the cell mapping deletion job simply removes the cell mapping. Leading to a ready cell from nova-operator perspective but an unmapped cell from openstack perspective.