Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60106

NodeFilesystemAlmostOutOfFiles: tmpfs on /tmp has only 0.56% inodes available on RHOCP nodes

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      A customer is getting a NodeFilesystemAlmostOutOfFiles alert due to inode exhaustion on the /tmp mount.
      ~~~
      NodeFilesystemAlmostOutOfFiles
      Aug 4, 2025, 7:06 AM
      Filesystem on tmpfs, mounted on /tmp, at ocp-wrk2.ocp.int.teb.com.tr has only 0.56% available inodes left.
      ~~~
      
      Although files under /tmp (e.g., exec-process-*) are deleted during deployment cleanup, the inode space is not being released, suggesting that processes may still be holding file handles. 
      
      Processes are still holding open file handles to deleted files in /tmp, preventing the inode count from being released. This is leading to inode exhaustion despite the absence of large or visible files.
      
      ~~~
      $ df -i
      Filesystem        Inodes   IUsed     IFree IUse% Mounted on
      devtmpfs        98991429    1026  98990403    1% /dev
      tmpfs           99003452       2  99003450    1% /dev/shm
      tmpfs             819200  393448    425752   49% /run
      tmpfs               1024      18      1006    2% /sys/fs/cgroup
      /dev/sda4      166953480 7619948 159333532    5% /sysroot
      tmpfs            1048576 1043249      5327  100% /tmp
      /dev/sda3          98304     324     97980    1% /boot
      tmpfs           19800690      14  19800676    1% /run/user/1000
      ~~~
      ~~~
      # du -sk /tmp/* | sort -n
      0       /tmp/systemd-private-517ea969a1204213bc60ff53f560b62b-chronyd.service-ILAovQ
      0       /tmp/systemd-private-517ea969a1204213bc60ff53f560b62b-dbus-broker.service-RntUTD
      0       /tmp/systemd-private-517ea969a1204213bc60ff53f560b62b-systemd-logind.service-wnJ3Rs
      4       /tmp/exec-process-115098596
      4       /tmp/exec-process-1335433519
      4       /tmp/exec-process-1456886553
      4       /tmp/exec-process-1539030326
      4       /tmp/exec-process-1919139384
      4       /tmp/exec-process-1927989415
      4       /tmp/exec-process-2114726635
      4       /tmp/exec-process-2398372778
      4       /tmp/exec-process-2695446883
      4       /tmp/exec-process-3143928529
      4       /tmp/exec-process-3492991084
      4       /tmp/exec-process-568241004
      4       /tmp/exec-process-670442358
      ~~~  
      ~~~
      # lsof +L1 /tmp
      COMMAND       PID USER   FD   TYPE DEVICE SIZE/OFF NLINK   NODE NAME
      dbus-brok 3293159 core   12u   REG    0,1  2097152     0 150428 /memfd:dbus-broker-log (deleted)
      bash      3457170 core  cwd    DIR   0,45      100     5      1 /tmp
      lsof      3483980 core  cwd    DIR   0,45      100     5      1 /tmp
      lsof      3483981 core  cwd    DIR   0,45      100     5      1 /tmp
      ~~~

      Version-Release number of selected component (if applicable):

      4.15.39

      Actual results:

      The inode usage remained at 100% (df -i shows /tmp has only 0.53% free inodes).
      As a result, the filesystem triggers a high-inode-usage alert.

      Expected results:

      The inodes should be freed and the NodeFilesystemAlmostOutOfFiles alert should clear.  

      Additional info:

      Kindly investigate and confirm:
      - Whether this is a known issue with inode handling in tmpfs under OpenShift/CRI-O.
      - If a patch or cleanup mechanism can be introduced to prevent inode leaks from deleted files held by orphaned processes.
      - Whether tmpfs size/inode settings should be adjusted at mount time in the base OS or cluster configuration. 
      
      Note : 
      Customers are unable to collect sos-report and getting error :
      ~~~
      # toolbox
      Checking if there is a newer version of registry.redhat.io/rhel9/support-tools available...
      Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel9/support-tools'
      Detected RUN label in the container image. Using that as the default...
      301a31cf1c1941efe5de45d51846e6ddafccd5622605237a24685f63e1cd89fe
      Error: unable to start container "301a31cf1c1941efe5de45d51846e6ddafccd5622605237a24685f63e1cd89fe": container create failed (no logs from conmon): conmon bytes "": readObjectStart: expect { or n, but found , error found in #0 byte of ...||..., bigger context ...||...
      /bin/toolbox: failed to start container 'toolbox-root'
      ~~~
      
      Node debug is also failing :
      ~~~
      # oc debug node/ocp-wrk2.ocp.int.teb.com.tr
      Temporary namespace openshift-debug-rb985 is created for debugging node...
      Starting pod/ocp-wrk2ocpinttebcomtr-debug-n8fcw ...
      To use host binaries, run `chroot /host`
      warning: Container container-00 is unable to start due to an error: error reading container (probably exited) json message: EOF
      ^C
      Removing debug pod ...
      warning: Container container-00 is unable to start due to an error: error reading container (probably exited) json message: EOF
      ~~~

              rh-ee-kehannon Kevin Hannon
              rhn-support-sdharma Suruchi Dharma
              None
              None
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: