Loading...

XML

Word

Printable

To Reproduce

We did not reproduce the issue but based on our understanding and the event list:

Customer stopped the instances on a compute that had HW issue (compute service was down) before trying to evacute them
Then they evacuated the instances and they went up fine on new destination compute
When they fixed source compute HW issues and brought it back online, the instances that were evacuated got stopped
Customer manually started the instances successfully
We suspect that the stop action from 1. somehow got queued while the the source compute was down, and trigered unwanted actions when it came back up.

Expected behavior

After an evac, instances running on destination compute should not be impacted once the source compute is back online

Bug impact

Known workaround

Reboot instances bring the workload up again but no known workaround to prevent them from stopping until we know why the instances stopped

Additional context

If the stop action issued by the customer before the evacuate is responsible for this issue, I think it is a bug.
I will update with more information.