Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: CNV v4.21.0
Affects Version/s: CNV v4.17.7
Component/s: CNV Virt-Node
Labels:
- cnv-performance
- perf-dept

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
None
Market:

Severity:
Important

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

With a very high overcommit ratio, such that the VM's memory request is set very small relative to the VM memory size, wasp-agent evicts VMs more aggressively than with a lower overcommit ratio even though the node is in no danger of running out of memory+swap resources.  At lower ratios, VMs fail to be created due to running out of resources.  For example, on a node with 192 GiB RAM, 1.7 TiB swap, running VMs with 16 GiB memory assigned and a workload consuming 14 GiB, setting the memory overcommit ratio to 1100 results in about 100 VMs being created; attempts to create more result in the excess failing to be created.  With the overcommit ratio set to 1200, after about 65 VMs being created and starting to run, some of the VMs are evicted in order to reclaim memory.  Setting the overcommit ratio even higher, or manually setting smaller requests, results in even fewer VMs running before eviction.  Setting it to 1300, for example, results in only 50 successfully running VMs, and to 1500, 36.

If I delete the wasp namespace (hence the wasp-agent daemonset), leaving the OCI hook in place, I'm able to create many more VMs without eviction.

Version-Release number of selected component (if applicable):

4.17.7

How reproducible:

Consistently; the exact number of VMs may vary very slightly from run to run.

Steps to Reproduce:

1. Install wasp-agent on a suitably large machine (this was tested on a system with 192 GiB RAM and 1.6 TiB swap on an NVMe device; this should allow something over 100 VMs of the size described below).
2. Set the swap threshold to 0.99, and the swapin/swapout settings to a very large number so that there will be no rate-based eviction.
3. Set the memoryOvercommitPercentage to 1500 in the HyperConverged object.
4. git clone https://github.com/RobertKrawitz/OpenShift4-tools
5. cd OpenShift4-tools
6. Run the following:
'./clusterbuster' '--precleanup' '--workload=memory' '--deployment-type=vm' '--memory-scan=1' '--vm-memory=16Gi' '--processes=1' '--deployments=60' '--artifactdir=memory-nowasp-nvme-60-%s' '--workload-runtime=180' '--memory-size=14Gi' '--vm-run-strategy=Manual' '--memory-iteration-runtime=180' '--retrieve-successful-logs=1' '--pod_start_timeout=300' '--pin-node=client=<worker_with_swap>' '--timeout=3600'

Actual results:

Test fails with one or more VMs evicted.

Expected results:

Test runs to completion and generates results

Additional info:

* Above workload creates 16 GiB VMs with a workload scanning a 14 GiB block of RAM.
* If the memoryOvercommitRatio is set to 1000, this test passes.
* Node logs report eviction to reclaim memory.
* wasp-agent log from the wasp-agent pod on the node in question contains no information about eviction.
* If wasp-agent is created and then `oc delete ns wasp` is run, the test runs successfully.
* Observe the same effect using VMs or burstable pods with explicit memory requests assigned (in this case, if the memory request is less than 1536 MiB, the number of VMs successfully created declines).

is related to

CNV-59435 With wasp-agent installed, burstable pods are able to use swap

Assignee:: Stuart Gott

Reporter:: Robert Krawitz

QA Contact:: Denys Shchedrivyi

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/04/03 5:30 PM

Updated:: 2025/08/26 12:58 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates