Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.8.0
Affects Version/s: Logging 5.8.0
Component/s: Log Collection
Labels:
- devel_ack+
- no-rn

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Flow Control
Docs QE Status:
NEW
QE Status:
VERIFIED
Intelligence Requested:
Market:

Sprint:
Log Collection - Sprint 239, Log Collection - Sprint 240, Log Collection - Sprint 241, Log Collection - Sprint 242, Log Collection - Sprint 243

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Create 3 projects, and deploy some pods in these projects, each container generates 6000 records/minute:

$ oc get pod -n multiple-containers --show-labels
NAME                   READY   STATUS    RESTARTS   AGE   LABELS
centos-logtest-5np54   3/3     Running   0          13m   run=centos-logtest,test=centos-logtest
$ oc get pod -n multiple-pods --show-labels
NAME                           READY   STATUS    RESTARTS   AGE   LABELS
logging-centos-logtest-724mp   1/1     Running   0          12m   run=centos-logtest,test=centos-logtest
logging-centos-logtest-8txrf   1/1     Running   0          12m   run=centos-logtest,test=centos-logtest
logging-centos-logtest-j9dpv   1/1     Running   0          12m   run=centos-logtest,test=centos-logtest
$ oc get pod -n test-1 --show-labels
NAME                           READY   STATUS    RESTARTS   AGE   LABELS
logging-centos-logtest-brbr5   1/1     Running   0          11m   run=centos-logtest,test=centos-logtest

Create a CLF with below yaml:

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  inputs:
  - application:
      groupLimit:
        maxRecordsPerSecond: 20
      selector:
        matchLabels:
          run: centos-logtest
          test: centos-logtest
    name: limited-rates
  pipelines:
  - inputRefs:
    - limited-rates
    - infrastructure
    - audit
    name: to-default
    outputRefs:
    - default

Deploy logging, use vector as the collector, and using lokistack as the log store.

Wait for 2 minutes, then check data in lokistack:

sum by(log_type)(count_over_time({log_type=\"application\"}[1m])):
      {
        "metric": {
          "log_type": "application"
        },
        "values": [
          [
            1689729244,
            "798"
          ],
          [
            1689729272,
            "2481"
          ],
          [
            1689729300,
            "3580"
          ],
          [
            1689729328,
            "3590"
          ],
          [
            1689729356,
            "3584"
          ],
          [
            1689729384,
            "3589"
          ]
        ]
      }

sum by(kubernetes_pod_name, kubernetes_namespace_name)(count_over_time({log_type=\"application\"}[1m])):
    "result": [
      {
        "metric": {
          "kubernetes_namespace_name": "multiple-containers",
          "kubernetes_pod_name": "centos-logtest-5np54"
        },
        "value": [
          1689729385.518,
          "41"
        ]
      },
      {
        "metric": {
          "kubernetes_namespace_name": "multiple-pods",
          "kubernetes_pod_name": "logging-centos-logtest-724mp"
        },
        "value": [
          1689729385.518,
          "1200"
        ]
      },
      {
        "metric": {
          "kubernetes_namespace_name": "multiple-pods",
          "kubernetes_pod_name": "logging-centos-logtest-8txrf"
        },
        "value": [
          1689729385.518,
          "11"
        ]
      },
      {
        "metric": {
          "kubernetes_namespace_name": "multiple-pods",
          "kubernetes_pod_name": "logging-centos-logtest-j9dpv"
        },
        "value": [
          1689729385.518,
          "1145"
        ]
      },
      {
        "metric": {
          "kubernetes_namespace_name": "test-1",
          "kubernetes_pod_name": "logging-centos-logtest-brbr5"
        },
        "value": [
          1689729385.518,
          "1191"
        ]
      }
    ],

The records count in every minute is approximately equal to 3600, which exceeds the groupLimit 1200.

vector.toml: vector.toml

Version-Release number of selected component (if applicable):

openshift-logging/cluster-logging-rhel9-operator/images/v5.8.0-79

How reproducible:

Always

Steps to Reproduce:

In `Description` part.

Actual results:

Expected results:

Records count in every minute should not exceed groupLimit.

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

vector.toml
2023/07/19 1:29 AM
23 kB
Qiaoling Tang
kibana_moving_avg_config.png
2023/09/07 6:44 PM
96 kB
Jeffrey Cantrill
Sanity_1lps.png
2023/09/14 8:08 PM
47 kB
Jeffrey Cantrill
image-2023-09-14-16-16-59-743.png
2023/09/14 8:17 PM
42 kB
Jeffrey Cantrill
group_limit_1200lpm.png
2023/09/14 8:17 PM
42 kB
Jeffrey Cantrill
more_containers.png
2023/09/14 8:26 PM
36 kB
Jeffrey Cantrill
Screenshot 2023-09-15 at 14.52.47.png
2023/09/15 6:53 AM
262 kB
Qiaoling Tang
screenshot-1.png
2023/09/15 8:13 PM
63 kB
Jeffrey Cantrill
Screenshot 2023-09-20 at 09.52.25.png
2023/09/20 1:52 AM
278 kB
Qiaoling Tang
ip-10-0-65-129.us-east-2.compute.internal.png
2023/09/21 2:02 AM
282 kB
Qiaoling Tang
ip-10-0-52-21.us-east-2.compute.internal.png
2023/09/21 2:02 AM
283 kB
Qiaoling Tang

links to

openshift/cluster-logging-operator#2173: LOG-4362: remove GroupLimit from flow control

mentioned on

Merge request - Updated US source to: ded73f3 Merge pull request #2173 from jcantrill/log4362

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates