-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-8.10
-
None
-
No
-
Low
-
rhel-systemd
-
8
-
False
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
This seems to only affect RHEL8, because apparently /run/mount/utab is not processed on RHEL9.
The issue seems due to a combination of util-linux mnt_table_merge_user_fs() implementation issue along with suboptimal code in systemd
A customer is hitting an issue where he cannot restart chronyd.service because it times out in systemd's child, i.e. while setting the mounts and before exec'ing /usr/bin/chronyd.
chronyd.service has the particularity (among other services) to have ProtectHome=yes, which remounts/unmounts file systems in the private namespace.
The customer has many mounts (1681).
However for some unknown reason (probably a bug in util-linux implementation), the /run/mount/utab file contains 61731 lines, which are all duplicates of 69 unique lines (attached in Private JIRA ATTACH-15409).
Due to this combination of mounts and lines in /run/mount/utab, it becomes impossible to start the service in the alloted 1min30.
The root cause is multiple:
- /run/mount/utab is read multiple times (this is likely a systemd implementation issue)
- there is a 13 seconds CPU consumption after reading /run/mount/utab once (this is likely a util-linux implementation issue)
Reproducer
I can reproduce the issue with mounting 100 NFS directories and bind-mounting these NFS directories 20 times each, to have 2000 mounts. I then additionally duplicate entries in /run/mount/utab to a large value (60000 entries for example). Finally I execute a transient service having ProtectHome=yes to reproduce.
- On my laptop I export /root and create 100 subdirectories /root/%d
- On the RHEL8 VM I mount these NFS exports
# for i in $(seq 1 100); do mkdir -p /root/$i; mount NFSSERVER:/root/$i /root/$i; done
- On the RHEL8 VM I bind-mount these NFS exports 20 times each
# for i in $(seq 1 20); do mkdir -p /mnt/binds/$i; for j in $(seq 1 100); do mkdir -p /mnt/binds/$i/$j; mount --bind /root/$i /mnt/binds/$i/$j; done; done
- I hack /run/mount/utab to have 600000 entries (60000 is sufficent but it spins even more with 600000 entries)
# for i in $(seq 1 6000); do cat /run/mount/utab; done > /run/mount/utab.new # mv /run/mount/utab.new /run/mount/utab
- I start the transient service
Expected result
Service starts in a few seconds
Actual result
systemd's child before execve() spins on the CPU for a long time (in mnt_table_merge_user_fs()) and we see multiple openat+read of /run/mount/utab with long delay after closing the file.
top output:
Tasks: 159 total, 3 running, 156 sleeping, 0 stopped, 0 zombie %Cpu(s): 49.8 us, 0.2 sy, 0.0 ni, 49.8 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 3665.7 total, 206.5 free, 689.4 used, 2769.8 buff/cache MiB Swap: 2048.0 total, 2045.5 free, 2.5 used. 2626.2 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 45471 root 20 0 414420 246984 4436 R 99.3 6.6 7:50.39 (sleep) 34 root 39 19 0 0 0 S 0.3 0.0 0:00.22 khugepaged 46047 root 20 0 54544 4440 3716 R 0.3 0.1 0:00.01 top
Startup still in systemd's child after 10 minutes:
[root@vm-utab8 ~]# date; ps -eaf | grep 45471 Fri Aug 1 08:19:19 CEST 2025 root 45471 1 98 08:09 ? 00:09:25 (sleep)
Multiple open/read/close on /run/mount/utab + delay (CPU computation) after closing:
# grep -A1 " close(4</run/mount/utab>" sleep.strace 45471 08:15:18.313164 close(4</run/mount/utab>) = 0 <0.000004> 45471 08:16:05.941783 umount2("/run/systemd/unit-root/root/7", UMOUNT_NOFOLLOW) = 0 <0.005742> -- 45471 08:16:06.420862 close(4</run/mount/utab>) = 0 <0.000004> 45471 08:16:55.042169 umount2("/run/systemd/unit-root/root/8", UMOUNT_NOFOLLOW) = 0 <0.005358> -- 45471 08:16:55.540636 close(4</run/mount/utab>) = 0 <0.000004> 45471 08:17:44.405181 umount2("/run/systemd/unit-root/root/9", UMOUNT_NOFOLLOW) = 0 <0.005442> -- 45471 08:17:45.166801 close(4</run/mount/utab>) = 0 <0.000011> 45471 08:18:32.310327 umount2("/run/systemd/unit-root/root/10", UMOUNT_NOFOLLOW) = 0 <0.004195> -- 45471 08:18:32.799824 close(4</run/mount/utab>) = 0 <0.000003> 45471 08:19:19.341849 umount2("/run/systemd/unit-root/root/11", UMOUNT_NOFOLLOW) = 0 <0.005690> -- 45471 08:19:19.837409 close(4</run/mount/utab>) = 0 <0.000004>
- is related to
-
RHEL-106986 /run/mount/utab can contains thousands of duplicates, causing systemd to misbehave when starting a protected service (e.g. chronyd.service)
-
- Planning
-
- relates to
-
RHEL-115271 Slow logins because spawning systemd-hostnamed takes 5 seconds due to having many mounts
-
- New
-
- links to