Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-3239

Performance regression in 2.9 with many plans

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 2.9.3
    • Controller
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • True
    • Moderate
    • Customer Reported

      Look at the different behaviour between versions of the forklift-controller (main), where the control loop looking at the plan runs.

      Scenario 1: 1000 plans (plan-1 to plan-1000), 1 done, 999 ready but not started

      Version 2.9.3

      It keeps looping through all plans all the time, it is doing so much work that it takes almost 3m to get to the sample plan again:

      {"level":"info","ts":"2025-08-27 23:36:11.340","logger":"plan|dtsh5","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
      {"level":"info","ts":"2025-08-27 23:38:54.995","logger":"plan|sbpj6","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
      {"level":"info","ts":"2025-08-27 23:41:38.065","logger":"plan|mcfz5","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}

      While not reaching the limits, it has higher CPU usage than 2.8.7 (below)

      Version 2.8.7

      Its quiet, it doesn't keep reconciling all the plans all the time, only when the plan is running.

      It has lower CPU usage compared to 2.9.3

      Scenario 2: 1000 plans, 1 done, 900 archived:

      Version 2.9.3

      2.9.3: It gets more efficient and speeds up, but still taking considerable time:

      {"level":"info","ts":"2025-08-28 00:36:22.852","logger":"plan|hbjs8","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
      {"level":"info","ts":"2025-08-28 00:36:39.063","logger":"plan|vtt7g","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
      {"level":"info","ts":"2025-08-28 00:37:01.054","logger":"plan|pcn7k","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3} 

      Version 2.8.7

      Again, quiet, efficient.

      Why is this a problem:

      • Those almost 3 minutes between loops end up causing big delays in itenerary transitions, as it takes time for the controller to get to the plan and make it move.
      • It adds up quickly, all itinerary transitions now have to wait ~3m for the controller to check them again, even if they are ready much before that.
      • Look at the transitions below with 500 plans, where there is a ~1m30s delay until the controller gets to the same plan again. A simple initialize takes almost 10m
      2025-08-27T07:56:46.533533073Z current phase":"Started","next phase":"CreateInitialSnapshot"}
      2025-08-27T07:58:38.882542907Z current phase":"CreateInitialSnapshot","next phase":"WaitForInitialSnapshot"}
      2025-08-27T08:01:00.329393856Z current phase":"WaitForInitialSnapshot","next phase":"StoreInitialSnapshotDeltas"}
      2025-08-27T08:02:54.348414029Z current phase":"StoreInitialSnapshotDeltas","next phase":"CreateDataVolumes"}
      2025-08-27T08:04:46.525918437Z current phase":"CreateDataVolumes","next phase":"WaitForDataVolumesStatus"}
      2025-08-27T08:06:38.039194719Z current phase":"WaitForDataVolumesStatus","next phase":"CopyDisks"}
      • If the customer clicks start in the UI, the system seems unresponsive, as it doesn't move for another 1m30s (500 plans) (or 3m with 1000 plans)

       

      Version-Release number of selected component (if applicable):

      2.9.3

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Create 1000 plans
      2. Observe forklift-controller main container
      3. Start one migration
      4. Observe the delays on transition times

      Actual results:

      Performance regression in 2.9.3

      Expected results:

      Same as 2.8.7

      Additional info:

      Archiving the plan is a workaround

              rh-ee-ehazan Elad Hazan
              rhn-support-gveitmic Germano Veit Michel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: