Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-106970

Cannot start services configured with ProtectHome=yes due to having many mounts and huge number of lines in /run/mount/utab

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-8.10
    • systemd
    • None
    • No
    • Low
    • rhel-systemd
    • 8
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      This seems to only affect RHEL8, because apparently /run/mount/utab is not processed on RHEL9.

      The issue seems due to a combination of util-linux mnt_table_merge_user_fs() implementation issue along with suboptimal code in systemd

      A customer is hitting an issue where he cannot restart chronyd.service because it times out in systemd's child, i.e. while setting the mounts and before exec'ing /usr/bin/chronyd.
      chronyd.service has the particularity (among other services) to have ProtectHome=yes, which remounts/unmounts file systems in the private namespace.

      The customer has many mounts (1681).
      However for some unknown reason (probably a bug in util-linux implementation), the /run/mount/utab file contains 61731 lines, which are all duplicates of 69 unique lines (attached in Private JIRA ATTACH-15409).
      Due to this combination of mounts and lines in /run/mount/utab, it becomes impossible to start the service in the alloted 1min30.
      The root cause is multiple:

      1. /run/mount/utab is read multiple times (this is likely a systemd implementation issue)
      2. there is a 13 seconds CPU consumption after reading /run/mount/utab once (this is likely a util-linux implementation issue)

      Reproducer

      I can reproduce the issue with mounting 100 NFS directories and bind-mounting these NFS directories 20 times each, to have 2000 mounts. I then additionally duplicate entries in /run/mount/utab to a large value (60000 entries for example). Finally I execute a transient service having ProtectHome=yes to reproduce.

      1. On my laptop I export /root and create 100 subdirectories /root/%d
      2. On the RHEL8 VM I mount these NFS exports
        # for i in $(seq 1 100); do mkdir -p /root/$i; mount NFSSERVER:/root/$i /root/$i; done
      3. On the RHEL8 VM I bind-mount these NFS exports 20 times each
        # for i in $(seq 1 20); do mkdir -p /mnt/binds/$i; for j in $(seq 1 100); do mkdir -p /mnt/binds/$i/$j; mount --bind /root/$i /mnt/binds/$i/$j; done; done
      4. I hack /run/mount/utab to have 600000 entries (60000 is sufficent but it spins even more with 600000 entries)
        # for i in $(seq 1 6000); do cat /run/mount/utab; done > /run/mount/utab.new
        # mv /run/mount/utab.new /run/mount/utab
      5. I start the transient service

      Expected result

      Service starts in a few seconds

      Actual result

      systemd's child before execve() spins on the CPU for a long time (in mnt_table_merge_user_fs()) and we see multiple openat+read of /run/mount/utab with long delay after closing the file.

      top output:

      Tasks: 159 total,   3 running, 156 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 49.8 us,  0.2 sy,  0.0 ni, 49.8 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
      MiB Mem :   3665.7 total,    206.5 free,    689.4 used,   2769.8 buff/cache
      MiB Swap:   2048.0 total,   2045.5 free,      2.5 used.   2626.2 avail Mem 
      
          PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
        45471 root      20   0  414420 246984   4436 R  99.3   6.6   7:50.39 (sleep)                                                
           34 root      39  19       0      0      0 S   0.3   0.0   0:00.22 khugepaged                                             
        46047 root      20   0   54544   4440   3716 R   0.3   0.1   0:00.01 top                                                    
      

      Startup still in systemd's child after 10 minutes:

      [root@vm-utab8 ~]# date; ps -eaf | grep 45471
      Fri Aug  1 08:19:19 CEST 2025
      root       45471       1 98 08:09 ?        00:09:25 (sleep)
      

      Multiple open/read/close on /run/mount/utab + delay (CPU computation) after closing:

      # grep -A1 " close(4</run/mount/utab>" sleep.strace
      45471 08:15:18.313164 close(4</run/mount/utab>) = 0 <0.000004>
      45471 08:16:05.941783 umount2("/run/systemd/unit-root/root/7", UMOUNT_NOFOLLOW) = 0 <0.005742>
      --
      45471 08:16:06.420862 close(4</run/mount/utab>) = 0 <0.000004>
      45471 08:16:55.042169 umount2("/run/systemd/unit-root/root/8", UMOUNT_NOFOLLOW) = 0 <0.005358>
      --
      45471 08:16:55.540636 close(4</run/mount/utab>) = 0 <0.000004>
      45471 08:17:44.405181 umount2("/run/systemd/unit-root/root/9", UMOUNT_NOFOLLOW) = 0 <0.005442>
      --
      45471 08:17:45.166801 close(4</run/mount/utab>) = 0 <0.000011>
      45471 08:18:32.310327 umount2("/run/systemd/unit-root/root/10", UMOUNT_NOFOLLOW) = 0 <0.004195>
      --
      45471 08:18:32.799824 close(4</run/mount/utab>) = 0 <0.000003>
      45471 08:19:19.341849 umount2("/run/systemd/unit-root/root/11", UMOUNT_NOFOLLOW) = 0 <0.005690>
      --
      45471 08:19:19.837409 close(4</run/mount/utab>) = 0 <0.000004>
      

              systemd-maint systemd maint mailing list
              rhn-support-rmetrich Renaud Métrich
              systemd maint mailing list systemd maint mailing list
              Frantisek Sumsal Frantisek Sumsal
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: