-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.16
Description of problem:
Under some circumstances, the live migration runs the MTU migration phase, it ends correctly, then while running the second MCO rollout to make the target CNI become in-use, it tries to run the MTU migration phase again. This happens more than once and ultimately causes the live migration to never complete.
Version-Release number of selected component (if applicable):
Tested in-house in 4.16.19
How reproducible:
Always under certain circumstances, sometimes otherwise.
Steps to Reproduce:
This is a way to reproduce it with 100% chance, but it may not be the only way to reproduce:
1. Start with a 4.16 cluster upgraded from 4.14 that has openshift-sdn plugin and a custom machine config pool (that inherits the worker machineconfigs, as required).
2. Start the live migration to OVN-Kubernetes
3. Once the MTU migration phase has completed for the first time, pause the custom machineconfigpool
Actual results:
MTU phase retried again and again.
Expected results:
MTU phase to be never repeated after being run for the first time. If there is some MCP paused, MCO rollout will stay on hold and live migration should stay on hold with it. If no MCP is paused, live migration should complete successfully. But what can never happen anyway is that the MTU phase is tried more than once.
Additional info:
This is a customer issue that can be reproduced as per the instructions. More details about what I have studied about the code behavior will be placed in comments (any required data will be shared privately).
- clones
-
OCPBUGS-44338 SDN to OVN-K live migration runs MTU migration phase more than once and fails
- POST
- is depended on by
-
OCPBUGS-44338 SDN to OVN-K live migration runs MTU migration phase more than once and fails
- POST