Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4436

Fluentd not releasing deleted file handles

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • NEW
    • Hide
      Prior to this change, fluentd was reported to not release rotated, deleted container log files when the cluster was under load. This change updates fluentd to v1.16.2 which has demonstrated improvements but does not conclusively resolve the issue. Additionally, this change introduces a fluentd tuning option to configure opening and closing files on every read which does resolve the issue but may lead duplicated log collection
      Show
      Prior to this change, fluentd was reported to not release rotated, deleted container log files when the cluster was under load. This change updates fluentd to v1.16.2 which has demonstrated improvements but does not conclusively resolve the issue. Additionally, this change introduces a fluentd tuning option to configure opening and closing files on every read which does resolve the issue but may lead duplicated log collection
    • Bug Fix
    • Proposed
    • Log Collection - Sprint 240, Log Collection - Sprint 241
    • Important
    • Customer Escalated

      Description of problem:

      Disk usage was consistently filling up, following one application pod around the environment. du did not show the disk usage, but lsof showed a large number of deleted files were still being locked by fluentd:

      fluentd   2712145                               root    8r      REG                8,4 828860702         480248013 /var/log/pods/example-5bbcf9c88-7gq5k_09859cdc-ed2d-4c42-9e1b-149322ce8cd8/istio-proxy/0.log (deleted)
      

      This bug is similar to the same issue described for vector in LOG-3949

      Version-Release number of selected component (if applicable):

      Logging 5.6.6

      How reproducible:

      Not able to reproduce. It happens when the logs from pods are deleted but fluentd continues with the log file opened, then, the space is not released. Possible to see in a impacted node:

      $ oc debug node/<node>
      # chroot /host
      # toolbox 
      # dnf install -y lsof
      # lsof -i |grep -i deleted > lsof.out
      # grep -c fluentd lsof.out
      11655

      Actual results:

      When checking the deleted files in the OS, but not released, a lot of files are coming from collector pod logs deleted by fluentd continues holding without releasing the deleted files

      Expected results:

      Fluentd should release deleted files.

      Workaround

      Restart the Logging collector pods for releasing the deleted files

      $ oc delete pods -l component=collector -n openshift-logging 

            jcantril@redhat.com Jeffrey Cantrill
            rhn-support-ocasalsa Oscar Casal Sanchez
            Anping Li Anping Li
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: