Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-1715

Preview replacing fluentd with vector as the primary collector for logging

XMLWordPrintable

    • Deploy Vector Collector as alternate Alpha offering
    • 25
    • False
    • False
    • NEW
    • To Do
    • OBSDA-108 - Distribute an alternate Vector Log Collector
    • VERIFIED
    • 0% To Do, 0% In Progress, 100% Done
    • undefined

      Goals

      • Logging with a "preview" Release of Vector as a collector
      • Smooth upgrade path from the last GA version of fluentd-based logging to Vector if spec'd
      • Clear documentation for migrating from existing fluentd-specific API features.
      • Compatible API extended to handle the vector collector
      • Internal API design docs on how handle this and future "component replacement" API extensions in a consistent way.

      Non-Goals

      TODO

      Motivation

      TODO

      Alternatives

      • fluent-bit

      Acceptance Criteria

      • Minimal support for output types: Kafka, Elasticsearch
      • Certificate authorization support for the implemented output types
      • Minimal normalization features: loglevel as LOG-1759, log source
      • Availability in the "preview" channel
      • Ability to switch deployment between fluentd and vector

      Stretch Goals

      • Support for output type: Cloudwatch, Loki

      TODO

      Documentation Considerations

      TODO

      Open Questions

      Do we support both fluentd and vector as configurable options in a single version?

      My gut reaction is "Hell No" but we need to think thru the ramifications.

      The ideal is that we support automatic, no-touch upgrade; which implies:

      • Vector logging supports the full range of API features that are not specifically related to fluentd with equivalent observable behavior.
      • Most fluentd-specific "tuning" parameters do not affect correct behavior, only performance under fluentd. They can be safely ignored by vector.
      • Any fluentd-specific configuration that does affect correctness (hopefully there is none) is translated via the full k8s 2-way API transformation trick into equivalent vector configuration.
        No short-cuts!  If we do it right, it will Just Work. If we get it wrong it will be worse than making the users migrate manually.

      I believe this is a reasonable goal, but we need some research and design up front to commit to it.

      The only reason to GA support for both collectors in a single version is so users (and we ourselves) can "flip the switch" between fluentd and vector in order to test and debug problems, and to ensure users can fall back to a working fluentd configuration if vector fails somehow.

      The better solution is to "flip the switch" by upgrading/downgrading between two versions that each support a single collector.

      That does mean up-front work so that the upgrade/downgrade process is smooth enough for this to be realistic, but it will pay off:

      • It will be more work in the long-term to maintain a two-headed beast, probably for much longer than we expect or want.
      • We will have to bite the bullet on migration anyway when we eventually deprecate and drop fluentd, and it won't be any easier then.
      • Smooth upgrade/downgrade is, in itself, a valuable improvement to user experience and maintenance costs.

      NOTE: Part of this epic is refactoring the codebase to make it possible to support multiple collectors in one version. This is important for code structure and clarity, maintenance, debugging, future new collectors, internal trials etc. However, we don't want to offer that to customers because of the maintenance and testing burden it implies.

      Additional Notes

      TODO - break down into stories & tasks.

          There are no Sub-Tasks for this issue.

              vimalkum@redhat.com Vimal Kumar
              rhn-engineering-aconway Alan Conway
              Ishwar Kanse Ishwar Kanse
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: