Uploaded image for project: 'Observability and Data Analysis Program'
  1. Observability and Data Analysis Program
  2. OBSDA-540

Vector and fluentd comparative for help on transition/adoption

    XMLWordPrintable

Details

    • Feature
    • Resolution: Unresolved
    • Major
    • None
    • Logging 5.7, Logging 5.6, Logging 5.8
    • Log Collection, PM Logging
    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%
    • 0

    Description

      Proposed title of this feature request
      Vector and fluentd comparative
      What is the nature and description of the request?

      Currently, Vector is GA as collector and Fluentd is deprecated. But, it's not clear how to do this migration, the impact of doing it (logs not compressed are again reprocessed), pros/cons, performance comparative, etc. 
      
      This darkness on the change of the solution and how it works Fluentd and how it works Vector (in memory currently) leads to don't help in the adoption/transition/migration from Fluentd to Vector. Then, it should be so much helpful to have a comparative between both.
      
      Some points that usually are requested are:
      
      Vector and fluentd comparative
        - On features (present on the current Logging documentation) [1]
        - Simulate a load, the same for fluentd and Vector and:
          + share the results of memory and cpu usage for both
          + number of events dropped
          + number of event reads per second
          + time to ingest the load
        - Vector works in memory out of the box vs Fluentd works using disk buffering then:
          + reliability vs performance
          + usage of memory of Vector can be bigger when having backpressure when delivering the logs
        - Define how works by default the outputs
          + Retry (what's retried and not)
          + drop or block
          + Vector has 500 events per output in memory, fluentd uses buffering to disk of 8GB by default per output
          + Vector adaptative concurrency
      
      Also, indicate for the miration, that all the logs not compressed will be reprocessed by Vector that can lead to:
        - have duplicated logs in the moment of the migration
        - problems on the log store on disk and performance as consequence of re-reading and processing all old logs the collector
        - a peak of memory and cpu in Vector until all the old logs are processed (these logs can be several GB per node). This also could lead to a big impact. By instance, an example of the impact:
      
      + LogMaxSize: 100M, they are uncompressed 2 logs per pod
      + Node with 100 pods
      
      GB to be read inmediately:
        - Per pods 100MB x 2 x 100 = 20G for this node for pods
        - + journal logs
        - + audit logs from node
      

       

      Why does the customer need this? (List the business requirements)

      Guide on the migration from Fluentd to Vector and also facilitate the adoption of Vector explaining how it works and pros/cons

      List any affected packages or components.

      Collector: Vector

       Additional information{}

      It should be good with a tool/script/steps being able to estimate how many GB from Journald, Audit, infrastructure and Applications will be read in the moment of the migration to know better the impact and with that, schedule the best business time with low load to make the migration from Fluentd to Vector.

      Attachments

        Activity

          People

            jamparke@redhat.com Jamie Parker
            rhn-support-ocasalsa Oscar Casal Sanchez
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: