Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-3949

Vector not releasing deleted file handles

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • ASSIGNED
    • Hide
      Before this change, the collector relied upon the default config setting when reading container log lines. This resulted in the collector not efficiently reading rotated files an high volume clusters and holding onto deleted file handles for a long time. This change increases the number of bytes read allowing the collector to more efficiently process rotated files.
      Show
      Before this change, the collector relied upon the default config setting when reading container log lines. This resulted in the collector not efficiently reading rotated files an high volume clusters and holding onto deleted file handles for a long time. This change increases the number of bytes read allowing the collector to more efficiently process rotated files.
    • Bug Fix
    • Proposed
    • Log Collection - Sprint 235, Log Collection - Sprint 238, Log Collection - Sprint 239, Log Collection - Sprint 240, Log Collection - Sprint 241, Log Collection - Sprint 242, Log Collection - Sprint 243
    • Critical

      Description of problem:

      Disk usage was consistently filling up, following one application pod around the environment. du did not show the disk usage, but lsof showed a large number of deleted files were still being locked by Vector:

      vector 3430171 root 163r REG 8,4 105040954 1040189142 /var/log/pods/example-dev_example-cmd-linux-2_a9a87c45-ecad-49af-bdb7-3877273e5b95/example-cmd-linux-pod/0.log.20230403-205041 (deleted)
      

      Deleting the collector pod (or killing the vector process) releases the files and they fully delete, clearing the space.

      Version-Release number of selected component (if applicable):

      cluster-logging.5.5.4

      How reproducible:

      So far failed to reproduce. At this time the application which caused the issue is no longer running so not currently able to gather data from original cluster as the issue is active.

      Expected results:

      Vector should release deleted files.

      Additional info:

        1. deleted_fds_oldest_first.png
          deleted_fds_oldest_first.png
          45 kB
        2. deleted_fds_rotate_wait.png
          deleted_fds_rotate_wait.png
          101 kB
        3. file-handle comparison.png
          file-handle comparison.png
          34 kB
        4. Logging5.7.5Metrics.png
          Logging5.7.5Metrics.png
          60 kB
        5. screenshot-1.png
          screenshot-1.png
          62 kB
        6. vector_file_deleted_given_up_individual.png
          vector_file_deleted_given_up_individual.png
          164 kB
        7. vector_file_deleted_given_up_total_sum.png
          vector_file_deleted_given_up_total_sum.png
          87 kB
        8. vector_open_files_5.8.0.png
          vector_open_files_5.8.0.png
          58 kB
        9. vector_open_files_record1.png
          vector_open_files_record1.png
          63 kB
        10. vector_openfile_0926.png
          vector_openfile_0926.png
          54 kB

            syedriko_sub@redhat.com Sergey Yedrikov
            rhn-support-stwalter Steven Walter
            Anping Li Anping Li
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: