Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-47633

SDN to OVN-K live migration runs MTU migration phase more than once and fails


    • Important
    • None
    • SDN Sprint 264, SDN Sprint 265, SDN Sprint 266
    • 3
    • False
    • Hide


    • Hide
      *Cause*: What actions or circumstances cause this bug to present.
      *Consequence*: What happens when the bug presents.
      *Fix*: What was done to fix the bug.
      *Result*: Bug doesn’t present anymore.
      *Cause*: What actions or circumstances cause this bug to present. *Consequence*: What happens when the bug presents. *Fix*: What was done to fix the bug. *Result*: Bug doesn’t present anymore.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-44338. The following is the description of the original issue:

      Description of problem:

      Under some circumstances, the live migration runs the MTU migration phase, it ends correctly, then while running the second MCO rollout to make the target CNI become in-use, it tries to run the MTU migration phase again. This happens more than once and ultimately causes the live migration to never complete.

      Version-Release number of selected component (if applicable):

      Tested in-house in 4.16.19

      How reproducible:

      Always under certain circumstances, sometimes otherwise.

      Steps to Reproduce:

      This is a way to reproduce it with 100% chance, but it may not be the only way to reproduce:

      1. Start with a 4.16 cluster upgraded from 4.14 that has openshift-sdn plugin and a custom machine config pool (that inherits the worker machineconfigs, as required).

      2. Start the live migration to OVN-Kubernetes

      3. Once the MTU migration phase has completed for the first time, pause the custom machineconfigpool

      Actual results:

      MTU phase retried again and again.

      Expected results:

      MTU phase to be never repeated after being run for the first time. If there is some MCP paused, MCO rollout will stay on hold and live migration should stay on hold with it. If no MCP is paused, live migration should complete successfully. But what can never happen anyway is that the MTU phase is tried more than once.

      Additional info:

      This is a customer issue that can be reproduced as per the instructions. More details about what I have studied about the code behavior will be placed in comments (any required data will be shared privately).

              pliurh Peng Liu
              openshift-crt-jira-prow OpenShift Prow Bot
              Anurag Saxena Anurag Saxena
              0 Vote for this issue
              6 Start watching this issue
