Loading...

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: Logging 5.8.0
Affects Version/s: Logging 5.8.0
Component/s: Log Collection
Labels:
- devel_ack+
- no-rn

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Flow Control
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Type:
Release Note Not Required
Intelligence Requested:
Market:

Sprint:
Log Collection - Sprint 243

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Setting outputs[].limit.maxRecordsPerSecond in a output, then monitor the doc count of application logs in the log store, the total count is always equal to $count-of-worker-nodes*maxRecordsPerSecond.

CLF:

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
  - elasticsearch:
      version: 6
    limit:
      maxRecordsPerSecond: 10
    name: es-created-by-user
    type: elasticsearch
    url: http://elasticsearch-server-e2e-test-vector-es-namespace-glchn.apps.test.com:80
  pipelines:
  - inputRefs:
    - application
    labels:
      logging-labels: test-labels
    name: forward-to-external-es
    outputRefs:
    - es-created-by-user

Doc count in application logs:

Sun Oct  8 15:01:28 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      20718            0      9.5mb          9.5mb


Sun Oct  8 15:02:29 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      22529            0     11.6mb         11.6mb


Sun Oct  8 15:03:29 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      24226            0     13.8mb         13.8mb


Sun Oct  8 15:04:30 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      25880            0     13.4mb         13.4mb


Sun Oct  8 15:05:31 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      27572            0     13.1mb         13.1mb


Sun Oct  8 15:06:31 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      29211            0     15.4mb         15.4mb


Sun Oct  8 15:07:32 CST 2023
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      30908            0     14.7mb         14.7mb

The doc count increases about 1800 in one minute, but the expected value is 600.

In my cluster, there are 3 worker nodes, and on each worker nodes, there are some pods to generate app logs:

$ oc get node
NAME                                                  STATUS   ROLES                  AGE     VERSION
qitang-vcmdt-master-0.c.openshift-qe.internal         Ready    control-plane,master   7h6m    v1.27.6+fd4d1f9
qitang-vcmdt-master-1.c.openshift-qe.internal         Ready    control-plane,master   7h5m    v1.27.6+fd4d1f9
qitang-vcmdt-master-2.c.openshift-qe.internal         Ready    control-plane,master   7h5m    v1.27.6+fd4d1f9
qitang-vcmdt-worker-a-8svps.c.openshift-qe.internal   Ready    worker                 6h55m   v1.27.6+fd4d1f9
qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   Ready    worker                 6h55m   v1.27.6+fd4d1f9
qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   Ready    worker                 6h55m   v1.27.6+fd4d1f9

$ oc get pod -A -l run=centos-logtest -owide
NAMESPACE                            NAME                           READY   STATUS    RESTARTS   AGE    IP             NODE                                                  NOMINATED NODE   READINESS GATES
e2e-test-vector-es-namespace-gkvrc   logging-centos-logtest-nmxxc   1/1     Running   0          25m    10.131.0.215   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
test-1                               json-log-gghks                 1/1     Running   0          20m    10.129.2.155   qitang-vcmdt-worker-a-8svps.c.openshift-qe.internal   <none>           <none>
test-1                               json-log-n8mf9                 1/1     Running   0          20m    10.128.2.195   qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   <none>           <none>
test-1                               json-log-qsprd                 1/1     Running   0          21m    10.131.0.220   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
test-2                               json-log-4sh4w                 1/1     Running   0          104m   10.131.0.169   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
test-3                               json-log-pf7d5                 1/1     Running   0          104m   10.128.2.173   qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   <none>           <none>
test                                 json-log-1-m5n7n               1/1     Running   0          104m   10.131.0.171   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
test                                 json-log-2-hfqw7               1/1     Running   0          104m   10.128.2.174   qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   <none>           <none>
test                                 json-log-3-w4w7t               1/1     Running   0          104m   10.131.0.172   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>

Version-Release number of selected component (if applicable):

openshift-logging/cluster-logging-rhel9-operator/images/v5.8.0-177

openshift-logging/vector-rhel9/images/v0.28.1-30

How reproducible:

Always

Steps to Reproduce:

Deploy some pods to generate logs
Create CLF with above yaml file
Monitor doc count in log store

Actual results:

The outputs[].limit.maxRecordsPerSecond is always exceeded, the actual value is $count-of-worker-nodes*maxRecordsPerSecond.

Expected results:

The outputs[].limit.maxRecordsPerSecond shouldn't be exceeded.

Additional info:

links to

openshift/cluster-logging-operator#2204: Revert 4568

mentioned on

Merge request - Updated US source to: 919f685 Merge pull request #2205 from syedriko/syedriko-log-4591

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates