Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-8.10
Component/s: systemd
Labels:
None

Regression:
No
Severity:
Low

AssignedTeam:
rhel-systemd

Story Points:
8
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Products:

Red Hat Enterprise Linux
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Experience:

PX Impact Score:
PX Technical Impact:
PX Impact Range:
PX Review Complete:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

This seems to only affect RHEL8, because apparently /run/mount/utab is not processed on RHEL9.

The issue seems due to a combination of util-linux mnt_table_merge_user_fs() implementation issue along with suboptimal code in systemd

A customer is hitting an issue where he cannot restart chronyd.service because it times out in systemd's child, i.e. while setting the mounts and before exec'ing /usr/bin/chronyd.
chronyd.service has the particularity (among other services) to have ProtectHome=yes, which remounts/unmounts file systems in the private namespace.

The customer has many mounts (1681).
However for some unknown reason (probably a bug in util-linux implementation), the /run/mount/utab file contains 61731 lines, which are all duplicates of 69 unique lines (attached in Private JIRA ATTACH-15409).
Due to this combination of mounts and lines in /run/mount/utab, it becomes impossible to start the service in the alloted 1min30.
The root cause is multiple:

/run/mount/utab is read multiple times (this is likely a systemd implementation issue)
there is a 13 seconds CPU consumption after reading /run/mount/utab once (this is likely a util-linux implementation issue)

Reproducer

I can reproduce the issue with mounting 100 NFS directories and bind-mounting these NFS directories 20 times each, to have 2000 mounts. I then additionally duplicate entries in /run/mount/utab to a large value (60000 entries for example). Finally I execute a transient service having ProtectHome=yes to reproduce.

On my laptop I export /root and create 100 subdirectories /root/%d

On the RHEL8 VM I mount these NFS exports

# for i in $(seq 1 100); do mkdir -p /root/$i; mount NFSSERVER:/root/$i /root/$i; done

On the RHEL8 VM I bind-mount these NFS exports 20 times each

# for i in $(seq 1 20); do mkdir -p /mnt/binds/$i; for j in $(seq 1 100); do mkdir -p /mnt/binds/$i/$j; mount --bind /root/$i /mnt/binds/$i/$j; done; done

I hack /run/mount/utab to have 600000 entries (60000 is sufficent but it spins even more with 600000 entries)

# for i in $(seq 1 6000); do cat /run/mount/utab; done > /run/mount/utab.new
# mv /run/mount/utab.new /run/mount/utab

I start the transient service

Expected result

Service starts in a few seconds

Actual result

systemd's child before execve() spins on the CPU for a long time (in mnt_table_merge_user_fs()) and we see multiple openat+read of /run/mount/utab with long delay after closing the file.

top output:

Tasks: 159 total,   3 running, 156 sleeping,   0 stopped,   0 zombie
%Cpu(s): 49.8 us,  0.2 sy,  0.0 ni, 49.8 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
MiB Mem :   3665.7 total,    206.5 free,    689.4 used,   2769.8 buff/cache
MiB Swap:   2048.0 total,   2045.5 free,      2.5 used.   2626.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
  45471 root      20   0  414420 246984   4436 R  99.3   6.6   7:50.39 (sleep)                                                
     34 root      39  19       0      0      0 S   0.3   0.0   0:00.22 khugepaged                                             
  46047 root      20   0   54544   4440   3716 R   0.3   0.1   0:00.01 top

Startup still in systemd's child after 10 minutes:

[root@vm-utab8 ~]# date; ps -eaf | grep 45471
Fri Aug  1 08:19:19 CEST 2025
root       45471       1 98 08:09 ?        00:09:25 (sleep)

Multiple open/read/close on /run/mount/utab + delay (CPU computation) after closing:

# grep -A1 " close(4</run/mount/utab>" sleep.strace
45471 08:15:18.313164 close(4</run/mount/utab>) = 0 <0.000004>
45471 08:16:05.941783 umount2("/run/systemd/unit-root/root/7", UMOUNT_NOFOLLOW) = 0 <0.005742>
--
45471 08:16:06.420862 close(4</run/mount/utab>) = 0 <0.000004>
45471 08:16:55.042169 umount2("/run/systemd/unit-root/root/8", UMOUNT_NOFOLLOW) = 0 <0.005358>
--
45471 08:16:55.540636 close(4</run/mount/utab>) = 0 <0.000004>
45471 08:17:44.405181 umount2("/run/systemd/unit-root/root/9", UMOUNT_NOFOLLOW) = 0 <0.005442>
--
45471 08:17:45.166801 close(4</run/mount/utab>) = 0 <0.000011>
45471 08:18:32.310327 umount2("/run/systemd/unit-root/root/10", UMOUNT_NOFOLLOW) = 0 <0.004195>
--
45471 08:18:32.799824 close(4</run/mount/utab>) = 0 <0.000003>
45471 08:19:19.341849 umount2("/run/systemd/unit-root/root/11", UMOUNT_NOFOLLOW) = 0 <0.005690>
--
45471 08:19:19.837409 close(4</run/mount/utab>) = 0 <0.000004>

is related to

RHEL-106986 /run/mount/utab can contains thousands of duplicates, causing systemd to misbehave when starting a protected service (e.g. chronyd.service)

Planning

relates to

RHEL-115271 Slow logins because spawning systemd-hostnamed takes 5 seconds due to having many mounts

links to

https://github.com/systemd/systemd/issues/31137

Assignee:: systemd maint mailing list

Reporter:: Renaud Métrich

Developer:: systemd maint mailing list

QA Contact:: Frantisek Sumsal

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/08/01 8:12 AM

Updated:: 2025/09/30 11:55 AM

Stale Date:: 2026/08/13

Details

Description

Reproducer

Expected result

Actual result

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide