XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.8.0
Affects Version/s: None
Component/s: Log Collection
Labels:

Epic Name:
Flow Control
Color Status:
Green
Documentation Type:

Administer, API, Instructions
Epic Status:
In Progress
Flagged:

Impediment
QE Status:
VERIFIED
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:

Hide
Preview: flow control or rate limiting limits the volume of log data that can be collected and/or forwarded. Input limits prevent badly-behaved containers from over-loading the logging system. Output limits put a ceiling on the amount of log data stored. Limits are enforced by dropping excess log records.

Show
Preview: flow control or rate limiting limits the volume of log data that can be collected and/or forwarded. Input limits prevent badly-behaved containers from over-loading the logging system. Output limits put a ceiling on the amount of log data stored. Limits are enforced by dropping excess log records.
Release Note Type:
Feature
Release Note Status:
Proposed
Target Version:

Logging 5.7.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Goals

As a cluster admin I can:

Limit per-container logging rates (bytes/sec) for selected containers:
- Optional cluster-wide default for all containers.
- Specific rate for containers in listed namespaces.
- Specific rate for containers matching a label selector.
Ignore (do not collect) logs from selected containers

The logging system will drop data, if necessary, to keep containers within their limits.
Which data gets dropped depends on timing and other run-time factors in the logging stack.

We want admins to be able to:

Set predictable limits on logging
- Simplify provisioning
- Avoid unexpected overloads.

Non-goals

The following are not goals for this Epic, they will be covered separately:

Back-pressure (Epic LOG-1073) is a separate Epic. Some use cases will not tolerate back-pressure. Measurement and rate control are needed even with back-pressure.

Combined rate limits (Epic ~~LOG-1074~~) are more useful to admins, but more complex to implement (for example, set a combined rate limit for all containers in a namespace). Per-container limits are a necessary first step and have some value alone.

Content-based filtering dropping logs selectively based on content (e.g. debug vs. info logs) is something that may be supported in future.

Motivation

The logging system lacks flow control. The CRI-O container run-times write to disk as fast as container produce logs, there is no co-ordination with the logging collector reading those files. This results in:

Log loss if the logs are written faster than they are read.
Back-up of log data at various buffering points; causes slow recovery and high latency.

We cannot prevent log loss completely, but we need to provide better control over it. In particular we need to ensure that "noisy neighbors" or "bad actors" can't clog up the system and prevent collecting logs from well-behaved applications.

Acceptance Criteria

Verify that a default per-container rate is enforced (data is dropped) correctly.
Verify that selective rates by label or namespace are enforced correctly.
Verify that ignored logs are not collected or forwarded.

Dependencies (internal and external)

Selector APIs - label selectors, namespace selectors
Perf/Scale team to verify performance implications for block policy.

Previous Work

Metrics and dashboards added by ~~LOG-915~~.

Open questions

blocks

LOG-1073 Implement flow control by backpressure

LOG-1074 Combined rate limits for flow control

Closed

clones

RFE-1397 Allow customer to configure logging excludes via logging api

Closed

impacts account

OBSDA-324 Data cleanup and retention for logging

Closed

is documented by

OBSDOCS-479 [DOC] Flow control mechanisms for more predictable log collection

Closed

is related to

LOG-1032 [spike] Metric for produced container logs and logs collected by the collector

Closed

LOG-2399 Loki output exceeds Loki buffer size and rate limits.

Closed

relates to

LOG-1611 [spike]Study of new collectors for new metrics and Rate Control provisions

Closed

OBSDA-366 Allow changes to the exclude_path in fluent.conf in managed mode

Closed

links to

openshift/enhancements#1028: flow-control-api spec

(2 is related to, 2 relates to, 1 links to)

QE-Tracker

Closed

Qiaoling Tang

Assignee:: Alan Conway

Reporter:: Christian Heidenreich (Inactive)

QA Contact:: Qiaoling Tang

Votes:: 1 Vote for this issue

Watchers:: 19 Start watching this issue

Created:: 2020/08/05 7:08 AM

Updated:: 2024/10/07 3:25 PM

Resolved:: 2023/10/31 11:49 PM

Details

Description

Goals

Non-goals

Motivation

Acceptance Criteria

Dependencies (internal and external)

Previous Work

Open questions

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates