-
Task
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
False
-
None
-
False
-
-
Intro
We have observed the forklift-controller go into an OOM (Out Of Memory) state and ultimately enter CrashLoopBackoff when migrating a large number of VMs (e.g., 100+ VMs across 20+ plans) from VMware. We need to create a new performance test to replicate these conditions systematically to measure resource usage, identify bottlenecks, and ensure forklift-controller stability during high-volume migrations.
Background
Environment
- OCP+Virt clusters running as guest VMs on top of OCP+Virt deployed on IBM bare metal.
- Mostly external Ceph storage (external ODF) in use.
Observed Issues
- forklift-controller OOM with 40–100 VM concurrent migrations.
- forklift-volume-populator-controller complaining about PVC references that “no longer exist.”
- Alerts related to ODF external Ceph performance degradation, network saturation from re-transmissions, and RBD plugin provisioner struggling to keep up with the provisioning load.
- relates to
-
MTV-1343 forklift-controller OOM during 100 VM/20+ plan migration from VMware
-
- New
-