Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: rhel-8.10
Affects Version/s: rhel-8.6.0
Component/s: tuned
Labels:
- MigratedToJIRA

Fixed in Build:
tuned-2.22.0-1.el8
Regression:
None
Severity:
None

AssignedTeam:
rhel-net-perf
Sub-System Group:

ssg_core_services

Internal Target Milestone:
26
Story Points:
None
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
Pass
Errata Link:
https://errata.engineering.redhat.com/advisory/127904
Gating Tests:

Not Needed
Test Coverage:

RegressionOnly

Release Note Type:
If docs needed, set a value

Experience:
Architecture:

Unspecified
Bugzilla Bug:
RHBZ: 2140203

PX Impact Score:
PX Technical Impact:
PX Impact Range:
PX Priority Data:
PX Review Complete:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None
Internal Target Milestone numeric:
57,005

Description of problem:
With RHEL 8.6 the tuned throughput-performance profile uses the scheduler plugin for some settings for which it used the sysctl plugin, before (e.g. in RHEL 7.9).

Version-Release number of selected component (if applicable):
tuned-2.18.0-2.el8_6.1.noarch

How reproducible:
always

Steps to Reproduce:
1. make sure that the throughput-performance tuned profile is activated (otherwise: `tuned-adm profile throughput-performance`)
2. increase the fork-rate of the system (until the tuned process uses 30 % CPU or more)
3. `perf trace -s -p $(pgrep tuned) – sleep 60`

Actual results:
tuned CPU usage increases with the fork rate, easily up to 30 % and more
perf trace output shows high syscall rates for one tuned thread, i.e. for poll(), read(), openat(), lseek(), ioctl(), close() and fstat()

Expected results:
tuned CPU usage is very low (just a few percent) and is independent of the fork rate of the system.

Additional info:
This is caused by how the scheduler plugin polls for process creation events, even when the plugin's usage doesn't contain any process matching declarations, as with the throughput-performance profile. Each such event is then amplified by tuned invoking multiple syscalls on pseudo files under /proc/$pid/.

Looking at a syscall trace in detail shows that a bunch of syscalls to read files under /proc/$pid/ is superfluous or even pointless (even if there were process matching declarations in the config), e.g.:

```
196436 openat(AT_FDCWD, "/proc/3678736/cmdline", O_RDONLY|O_CLOEXEC) = 28</proc/3678736/cmdline>
196436 fstat(28</proc/3678736/cmdline>,

{st_mode=S_IFREG|0444, st_size=0, ...}

) = 0
196436 ioctl(28</proc/3678736/cmdline>, TCGETS, 0x7f1113ffd410) = -1 ENOTTY (Inappropriate ioctl for device)
196436 lseek(28</proc/3678736/cmdline>, 0, SEEK_CUR) = 0
196436 ioctl(28</proc/3678736/cmdline>, TCGETS, 0x7f1113ffd3f0) = -1 ENOTTY (Inappropriate ioctl for device)
196436 lseek(28</proc/3678736/cmdline>, 0, SEEK_CUR) = 0
196436 read(28</proc/3678736/cmdline>, "/opt/xyz/bin/foobar\0foobar\0", 8192) = 23
196436 read(28</proc/3678736/cmdline>, "", 8192) = 0
196436 close(28</proc/3678736/cmdline>) = 0
```

A simple fix for the throughput-performance profile (which is activated, by default, on RHEL systems) is to convert the scheduler plugin settings back to sysctl ones, e.g. like this:

```
— /usr/lib/tuned/throughput-performance/tuned.conf 2022-06-08 11:48:16.000000000 +0200
+++ new/throughput-performance/tuned.conf 2022-11-04 18:03:05.468461294 +0100
@@ -58,12 +58,11 @@

and move them to swap cache
vm.swappiness=10

-[scheduler]

ktune sysctl settings for rhel6 servers, maximizing i/o throughput
#
Minimal preemption granularity for CPU-bound tasks:
(default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)
-sched_min_granularity_ns = 10000000
+kernel.sched_min_granularity_ns = 10000000

SCHED_OTHER wake-up granularity.
(default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)
@@ -71,7 +70,7 @@
This option delays the preemption effects of decoupled workloads
and reduces their over-scheduling. Synchronous workloads will still
have immediate wakeup/sleep latencies.
-sched_wakeup_granularity_ns = 15000000
+kernel.sched_wakeup_granularity_ns = 15000000

Marvell ThunderX
[sysctl.thunderx]
@@ -81,8 +80,8 @@
kernel.numa_balancing=0

AMD
-[scheduler.amd]
-type=scheduler
+[sysctl.amd]
+type=sysctl
uname_regex=x86_64
cpuinfo_regex=${amd_cpuinfo_regex}
-sched_migration_cost_ns=5000000
+kernel.sched_migration_cost_ns=5000000
```