Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4592

Collector pods don't know how many records have been forwarded by each other making outputs[].limit.maxRecordsPerSecond is exceeded.

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Release Note Not Required
    • Log Collection - Sprint 243

      Description of problem:

      Setting outputs[].limit.maxRecordsPerSecond in a output, then monitor the doc count of application logs in the log store, the total count is always equal to $count-of-worker-nodes*maxRecordsPerSecond.

      CLF:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        outputs:
        - elasticsearch:
            version: 6
          limit:
            maxRecordsPerSecond: 10
          name: es-created-by-user
          type: elasticsearch
          url: http://elasticsearch-server-e2e-test-vector-es-namespace-glchn.apps.test.com:80
        pipelines:
        - inputRefs:
          - application
          labels:
            logging-labels: test-labels
          name: forward-to-external-es
          outputRefs:
          - es-created-by-user

      Doc count in application logs:

      Sun Oct  8 15:01:28 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      20718            0      9.5mb          9.5mb
      
      
      Sun Oct  8 15:02:29 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      22529            0     11.6mb         11.6mb
      
      
      Sun Oct  8 15:03:29 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      24226            0     13.8mb         13.8mb
      
      
      Sun Oct  8 15:04:30 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      25880            0     13.4mb         13.4mb
      
      
      Sun Oct  8 15:05:31 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      27572            0     13.1mb         13.1mb
      
      
      Sun Oct  8 15:06:31 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      29211            0     15.4mb         15.4mb
      
      
      Sun Oct  8 15:07:32 CST 2023
      health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
      yellow open   app-write 0oppTrZtRLa_i1kK-WY31g   5   1      30908            0     14.7mb         14.7mb

      The doc count increases about 1800 in one minute, but the expected value is 600.

      In my cluster, there are 3 worker nodes, and on each worker nodes, there are some pods to generate app logs:

      $ oc get node
      NAME                                                  STATUS   ROLES                  AGE     VERSION
      qitang-vcmdt-master-0.c.openshift-qe.internal         Ready    control-plane,master   7h6m    v1.27.6+fd4d1f9
      qitang-vcmdt-master-1.c.openshift-qe.internal         Ready    control-plane,master   7h5m    v1.27.6+fd4d1f9
      qitang-vcmdt-master-2.c.openshift-qe.internal         Ready    control-plane,master   7h5m    v1.27.6+fd4d1f9
      qitang-vcmdt-worker-a-8svps.c.openshift-qe.internal   Ready    worker                 6h55m   v1.27.6+fd4d1f9
      qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   Ready    worker                 6h55m   v1.27.6+fd4d1f9
      qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   Ready    worker                 6h55m   v1.27.6+fd4d1f9
      
      $ oc get pod -A -l run=centos-logtest -owide
      NAMESPACE                            NAME                           READY   STATUS    RESTARTS   AGE    IP             NODE                                                  NOMINATED NODE   READINESS GATES
      e2e-test-vector-es-namespace-gkvrc   logging-centos-logtest-nmxxc   1/1     Running   0          25m    10.131.0.215   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
      test-1                               json-log-gghks                 1/1     Running   0          20m    10.129.2.155   qitang-vcmdt-worker-a-8svps.c.openshift-qe.internal   <none>           <none>
      test-1                               json-log-n8mf9                 1/1     Running   0          20m    10.128.2.195   qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   <none>           <none>
      test-1                               json-log-qsprd                 1/1     Running   0          21m    10.131.0.220   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
      test-2                               json-log-4sh4w                 1/1     Running   0          104m   10.131.0.169   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
      test-3                               json-log-pf7d5                 1/1     Running   0          104m   10.128.2.173   qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   <none>           <none>
      test                                 json-log-1-m5n7n               1/1     Running   0          104m   10.131.0.171   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none>
      test                                 json-log-2-hfqw7               1/1     Running   0          104m   10.128.2.174   qitang-vcmdt-worker-b-2zk5m.c.openshift-qe.internal   <none>           <none>
      test                                 json-log-3-w4w7t               1/1     Running   0          104m   10.131.0.172   qitang-vcmdt-worker-c-24mzf.c.openshift-qe.internal   <none>           <none> 

      Version-Release number of selected component (if applicable):

      openshift-logging/cluster-logging-rhel9-operator/images/v5.8.0-177

      openshift-logging/vector-rhel9/images/v0.28.1-30

      How reproducible:

      Always

      Steps to Reproduce:

      1. Deploy some pods to generate logs
      2. Create CLF with above yaml file
      3. Monitor doc count in log store

      Actual results:

      The outputs[].limit.maxRecordsPerSecond is always exceeded, the actual value is $count-of-worker-nodes*maxRecordsPerSecond.

      Expected results:

      The outputs[].limit.maxRecordsPerSecond shouldn't be exceeded.

      Additional info:

            jcantril@redhat.com Jeffrey Cantrill
            qitang@redhat.com Qiaoling Tang
            Qiaoling Tang Qiaoling Tang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: