-
Bug
-
Resolution: Unresolved
-
Normal
-
CNV v4.17.7
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
None
-
-
Important
-
None
Description of problem:
With a very high overcommit ratio, such that the VM's memory request is set very small relative to the VM memory size, wasp-agent evicts VMs more aggressively than with a lower overcommit ratio even though the node is in no danger of running out of memory+swap resources. At lower ratios, VMs fail to be created due to running out of resources. For example, on a node with 192 GiB RAM, 1.7 TiB swap, running VMs with 16 GiB memory assigned and a workload consuming 14 GiB, setting the memory overcommit ratio to 1100 results in about 100 VMs being created; attempts to create more result in the excess failing to be created. With the overcommit ratio set to 1200, after about 65 VMs being created and starting to run, some of the VMs are evicted in order to reclaim memory. Setting the overcommit ratio even higher, or manually setting smaller requests, results in even fewer VMs running before eviction. Setting it to 1300, for example, results in only 50 successfully running VMs, and to 1500, 36. If I delete the wasp namespace (hence the wasp-agent daemonset), leaving the OCI hook in place, I'm able to create many more VMs without eviction.
Version-Release number of selected component (if applicable):
4.17.7
How reproducible:
Consistently; the exact number of VMs may vary very slightly from run to run.
Steps to Reproduce:
1. Install wasp-agent on a suitably large machine (this was tested on a system with 192 GiB RAM and 1.6 TiB swap on an NVMe device; this should allow something over 100 VMs of the size described below). 2. Set the swap threshold to 0.99, and the swapin/swapout settings to a very large number so that there will be no rate-based eviction. 3. Set the memoryOvercommitPercentage to 1500 in the HyperConverged object. 4. git clone https://github.com/RobertKrawitz/OpenShift4-tools 5. cd OpenShift4-tools 6. Run the following: './clusterbuster' '--precleanup' '--workload=memory' '--deployment-type=vm' '--memory-scan=1' '--vm-memory=16Gi' '--processes=1' '--deployments=60' '--artifactdir=memory-nowasp-nvme-60-%s' '--workload-runtime=180' '--memory-size=14Gi' '--vm-run-strategy=Manual' '--memory-iteration-runtime=180' '--retrieve-successful-logs=1' '--pod_start_timeout=300' '--pin-node=client=<worker_with_swap>' '--timeout=3600'
Actual results:
Test fails with one or more VMs evicted.
Expected results:
Test runs to completion and generates results
Additional info:
* Above workload creates 16 GiB VMs with a workload scanning a 14 GiB block of RAM. * If the memoryOvercommitRatio is set to 1000, this test passes. * Node logs report eviction to reclaim memory. * wasp-agent log from the wasp-agent pod on the node in question contains no information about eviction. * If wasp-agent is created and then `oc delete ns wasp` is run, the test runs successfully. * Observe the same effect using VMs or burstable pods with explicit memory requests assigned (in this case, if the memory request is less than 1536 MiB, the number of VMs successfully created declines).
- is related to
-
CNV-59435 With wasp-agent installed, burstable pods are able to use swap
-
- New
-