-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
rhel-9.5
-
None
-
No
-
Low
-
1
-
rhel-net-perf
-
ssg_core_services
-
0
-
False
-
False
-
-
None
-
_N&P-Refined_
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
We have a customer that reported unbound consistently eating up all available memory, triggering an OOM. Customer is running RHEL 9.5 and states that the same configuration works without issues on a CentOS 7 server. The system has a 4-core CPU and 16GB of RAM. The settings that appear to influence this behaviour are:
num-threads: 4 so-rcvbuf: 2m so-sndbuf: 2m outgoing-num-tcp: 1000 incoming-num-tcp: 1000 msg-cache-size: 1G rrset-cache-size: 2G
While we ran some tests with lower limits and lower system resources, we eventually saw the same behaviour. There's kind of a fixed formula to estimate memory consumption from upstream, based on the cache sizes, which essentially boils down to:
2.5 * (rrset-cache-size + msg-cache-size)
This should give us 7.5GB but unbound eventually eats up way more memory than that, triggering an OOM.
Additionally, according to the only relevant upstream report we could find , we could make an additional memory usage estimation based on the number of TCP connections, number of threads and the msg-buffer-size. This should give us:
(((66k * 1000) * 2) * 4)
or some ~512MB of additional memory. We're still way below 16GB, yet unbound eventually gets killed due to OOM.
Disabling THP has no effect, other than delaying how often this gets triggered, but it happens sooner or later. Customer needs to periodically (every ~3 days) restart unbound in order not to suffer an OOM.