XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: Logging 5.4.0
Affects Version/s: None
Component/s: Log Collection
Labels:

Epic Name:
Deploy Vector Collector as alternate Alpha offering
Story Points:
25
Blocked:
False
Ready:
False
Docs QE Status:
NEW
Epic Status:
To Do
Feature Link:
OBSDA-108 - Distribute an alternate Vector Log Collector
QE Status:
VERIFIED
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:
undefined

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Market:

Goals

Logging with a "preview" Release of Vector as a collector
Smooth upgrade path from the last GA version of fluentd-based logging to Vector if spec'd
Clear documentation for migrating from existing fluentd-specific API features.
Compatible API extended to handle the vector collector
Internal API design docs on how handle this and future "component replacement" API extensions in a consistent way.

Non-Goals

TODO

Motivation

TODO

Alternatives

fluent-bit

Acceptance Criteria

Minimal support for output types: Kafka, Elasticsearch
Certificate authorization support for the implemented output types
Minimal normalization features: loglevel as LOG-1759, log source
Availability in the "preview" channel
Ability to switch deployment between fluentd and vector

Stretch Goals

Support for output type: Cloudwatch, Loki

TODO

Documentation Considerations

TODO

Open Questions

Do we support both fluentd and vector as configurable options in a single version?

My gut reaction is "Hell No" but we need to think thru the ramifications.

The ideal is that we support automatic, no-touch upgrade; which implies:

Vector logging supports the full range of API features that are not specifically related to fluentd with equivalent observable behavior.
Most fluentd-specific "tuning" parameters do not affect correct behavior, only performance under fluentd. They can be safely ignored by vector.
Any fluentd-specific configuration that does affect correctness (hopefully there is none) is translated via the full k8s 2-way API transformation trick into equivalent vector configuration.
No short-cuts! If we do it right, it will Just Work. If we get it wrong it will be worse than making the users migrate manually.

I believe this is a reasonable goal, but we need some research and design up front to commit to it.

The only reason to GA support for both collectors in a single version is so users (and we ourselves) can "flip the switch" between fluentd and vector in order to test and debug problems, and to ensure users can fall back to a working fluentd configuration if vector fails somehow.

The better solution is to "flip the switch" by upgrading/downgrading between two versions that each support a single collector.

That does mean up-front work so that the upgrade/downgrade process is smooth enough for this to be realistic, but it will pay off:

It will be more work in the long-term to maintain a two-headed beast, probably for much longer than we expect or want.
We will have to bite the bullet on migration anyway when we eventually deprecate and drop fluentd, and it won't be any easier then.
Smooth upgrade/downgrade is, in itself, a valuable improvement to user experience and maintenance costs.

NOTE: Part of this epic is refactoring the codebase to make it possible to support multiple collectors in one version. This is important for code structure and clarity, maintenance, debugging, future new collectors, internal trials etc. However, we don't want to offer that to customers because of the maintenance and testing burden it implies.

Additional Notes

TODO - break down into stories & tasks.

blocks

LOG-1795 GA vector as the primary collector for logging (Phase 1)

Closed

OBSDA-116 Provide Tech Preview Support for Vector Collector with OpenShift Logging

Closed

OBSDA-117 Provide Tech Preview Support for Vector with Kafka for OpenShift

Closed

OBSDA-119 Provide Tech Preview Support for Vector with Elasticsearch for OpenShift

Closed

is blocked by

LOG-1423 Provide an alternative collector to the Kafka SRE team so that they can forward log messages to our internal Loki service

LOG-1766 Define, document, and implement strategy for releasing preview features

Closed

is cloned by

LOG-1795 GA vector as the primary collector for logging (Phase 1)

Closed

(1 is blocked by, 1 is cloned by)

There are no Sub-Tasks for this issue.

Assignee:: Vimal Kumar

Reporter:: Alan Conway

QA Contact:: Ishwar Kanse

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2021/08/27 1:21 PM

Updated:: 2022/09/09 6:23 AM

Resolved:: 2022/04/14 2:58 AM

Details

Description

Goals

Non-Goals

Motivation

Alternatives

Acceptance Criteria

Stretch Goals

Documentation Considerations

Open Questions

Do we support both fluentd and vector as configurable options in a single version?

Additional Notes

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates