-
Bug
-
Resolution: Done-Errata
-
Undefined
-
rhel-8.6.0
-
tuned-2.22.0-1.el8
-
None
-
None
-
rhel-sst-cs-net-perf-services
-
ssg_core_services
-
26
-
None
-
False
-
-
None
-
None
-
Pass
-
Not Needed
-
RegressionOnly
-
If docs needed, set a value
-
-
Unspecified
-
None
Description of problem:
With RHEL 8.6 the tuned throughput-performance profile uses the scheduler plugin for some settings for which it used the sysctl plugin, before (e.g. in RHEL 7.9).
Version-Release number of selected component (if applicable):
tuned-2.18.0-2.el8_6.1.noarch
How reproducible:
always
Steps to Reproduce:
1. make sure that the throughput-performance tuned profile is activated (otherwise: `tuned-adm profile throughput-performance`)
2. increase the fork-rate of the system (until the tuned process uses 30 % CPU or more)
3. `perf trace -s -p $(pgrep tuned) – sleep 60`
Actual results:
tuned CPU usage increases with the fork rate, easily up to 30 % and more
perf trace output shows high syscall rates for one tuned thread, i.e. for poll(), read(), openat(), lseek(), ioctl(), close() and fstat()
Expected results:
tuned CPU usage is very low (just a few percent) and is independent of the fork rate of the system.
Additional info:
This is caused by how the scheduler plugin polls for process creation events, even when the plugin's usage doesn't contain any process matching declarations, as with the throughput-performance profile. Each such event is then amplified by tuned invoking multiple syscalls on pseudo files under /proc/$pid/.
Looking at a syscall trace in detail shows that a bunch of syscalls to read files under /proc/$pid/ is superfluous or even pointless (even if there were process matching declarations in the config), e.g.:
```
196436 openat(AT_FDCWD, "/proc/3678736/cmdline", O_RDONLY|O_CLOEXEC) = 28</proc/3678736/cmdline>
196436 fstat(28</proc/3678736/cmdline>,
) = 0
196436 ioctl(28</proc/3678736/cmdline>, TCGETS, 0x7f1113ffd410) = -1 ENOTTY (Inappropriate ioctl for device)
196436 lseek(28</proc/3678736/cmdline>, 0, SEEK_CUR) = 0
196436 ioctl(28</proc/3678736/cmdline>, TCGETS, 0x7f1113ffd3f0) = -1 ENOTTY (Inappropriate ioctl for device)
196436 lseek(28</proc/3678736/cmdline>, 0, SEEK_CUR) = 0
196436 read(28</proc/3678736/cmdline>, "/opt/xyz/bin/foobar\0foobar\0", 8192) = 23
196436 read(28</proc/3678736/cmdline>, "", 8192) = 0
196436 close(28</proc/3678736/cmdline>) = 0
```
A simple fix for the throughput-performance profile (which is activated, by default, on RHEL systems) is to convert the scheduler plugin settings back to sysctl ones, e.g. like this:
```
— /usr/lib/tuned/throughput-performance/tuned.conf 2022-06-08 11:48:16.000000000 +0200
+++ new/throughput-performance/tuned.conf 2022-11-04 18:03:05.468461294 +0100
@@ -58,12 +58,11 @@
- and move them to swap cache
vm.swappiness=10
-[scheduler]
- ktune sysctl settings for rhel6 servers, maximizing i/o throughput
# - Minimal preemption granularity for CPU-bound tasks:
- (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)
-sched_min_granularity_ns = 10000000
+kernel.sched_min_granularity_ns = 10000000
- SCHED_OTHER wake-up granularity.
- (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)
@@ -71,7 +70,7 @@ - This option delays the preemption effects of decoupled workloads
- and reduces their over-scheduling. Synchronous workloads will still
- have immediate wakeup/sleep latencies.
-sched_wakeup_granularity_ns = 15000000
+kernel.sched_wakeup_granularity_ns = 15000000
- Marvell ThunderX
[sysctl.thunderx]
@@ -81,8 +80,8 @@
kernel.numa_balancing=0
- AMD
-[scheduler.amd]
-type=scheduler
+[sysctl.amd]
+type=sysctl
uname_regex=x86_64
cpuinfo_regex=${amd_cpuinfo_regex}
-sched_migration_cost_ns=5000000
+kernel.sched_migration_cost_ns=5000000
```
- external trackers
- links to
-
RHBA-2024:127904 tuned bug fix and enhancement update