-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
2.9.3
-
None
-
Incidents & Support
-
False
-
-
True
-
-
-
Moderate
-
Customer Reported
Look at the different behaviour between versions of the forklift-controller (main), where the control loop looking at the plan runs.
Scenario 1: 1000 plans (plan-1 to plan-1000), 1 done, 999 ready but not started
Version 2.9.3
It keeps looping through all plans all the time, it is doing so much work that it takes almost 3m to get to the sample plan again:
{"level":"info","ts":"2025-08-27 23:36:11.340","logger":"plan|dtsh5","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
{"level":"info","ts":"2025-08-27 23:38:54.995","logger":"plan|sbpj6","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
{"level":"info","ts":"2025-08-27 23:41:38.065","logger":"plan|mcfz5","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
While not reaching the limits, it has higher CPU usage than 2.8.7 (below)

Version 2.8.7
Its quiet, it doesn't keep reconciling all the plans all the time, only when the plan is running.
It has lower CPU usage compared to 2.9.3

Scenario 2: 1000 plans, 1 done, 900 archived:
Version 2.9.3
2.9.3: It gets more efficient and speeds up, but still taking considerable time:
{"level":"info","ts":"2025-08-28 00:36:22.852","logger":"plan|hbjs8","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
{"level":"info","ts":"2025-08-28 00:36:39.063","logger":"plan|vtt7g","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
{"level":"info","ts":"2025-08-28 00:37:01.054","logger":"plan|pcn7k","msg":"Reconcile ended.","plan":{"name":"plan-999","namespace":"openshift-mtv"},"reQ":3}
Version 2.8.7
Again, quiet, efficient.
Why is this a problem:
- Those almost 3 minutes between loops end up causing big delays in itenerary transitions, as it takes time for the controller to get to the plan and make it move.
- It adds up quickly, all itinerary transitions now have to wait ~3m for the controller to check them again, even if they are ready much before that.
- Look at the transitions below with 500 plans, where there is a ~1m30s delay until the controller gets to the same plan again. A simple initialize takes almost 10m
2025-08-27T07:56:46.533533073Z current phase":"Started","next phase":"CreateInitialSnapshot"} 2025-08-27T07:58:38.882542907Z current phase":"CreateInitialSnapshot","next phase":"WaitForInitialSnapshot"} 2025-08-27T08:01:00.329393856Z current phase":"WaitForInitialSnapshot","next phase":"StoreInitialSnapshotDeltas"} 2025-08-27T08:02:54.348414029Z current phase":"StoreInitialSnapshotDeltas","next phase":"CreateDataVolumes"} 2025-08-27T08:04:46.525918437Z current phase":"CreateDataVolumes","next phase":"WaitForDataVolumesStatus"} 2025-08-27T08:06:38.039194719Z current phase":"WaitForDataVolumesStatus","next phase":"CopyDisks"}
- If the customer clicks start in the UI, the system seems unresponsive, as it doesn't move for another 1m30s (500 plans) (or 3m with 1000 plans)
Version-Release number of selected component (if applicable):
2.9.3
How reproducible:
Always
Steps to Reproduce:
1. Create 1000 plans 2. Observe forklift-controller main container 3. Start one migration 4. Observe the delays on transition times
Actual results:
Performance regression in 2.9.3
Expected results:
Same as 2.8.7
Additional info:
Archiving the plan is a workaround
- depends on
-
MTV-3475 Hit unexpected MacConflicts error
-
- MODIFIED
-
- links to