Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.6.0
Affects Version/s: Logging 5.4.0
Component/s: Log Collection
Labels:
- candidate_560
- devel_ack+

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Market:

Sprint:
Logging (Core) - Sprint 214, Logging (Core) - Sprint 215, Logging (Core) - Sprint 216

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

When sending logs at a high volume and rate, loss of logs at round 50% is observed.

Steps to reproduce the issue:

1 Install Logging and Elasticsearch operator 5.5 - preview.

2 Create a ClusterLoggingInstance with Vector as collector.

3 Create a log producer pod which sends 4500000 of logs with a length of 1024 and a rate of 150000.

oc new-project logtesta0

oc label nodes --all placement=logtest

cat cm.yaml 

apiVersion: v1
data:
  ocp_logtest.cfg: |
    --num-lines 4500000 --line-length 1024 --word-length 9 --rate 150000 --fixed-line
kind: ConfigMap
metadata:
  name: logtest-config
  namespace: logtesta0
 
oc create -f cm.yaml

cat rc.yaml 

apiVersion: v1
kind: ReplicationController
metadata:
  generation: 1
  labels:
    run: centos-logtest
    test: centos-logtest
  name: centos-logtest
  namespace: logtesta0
spec:
  replicas: 1
  selector:
    run: centos-logtest
    test: centos-logtest
  template:
    metadata:
      generateName: centos-logtest-
      labels:
        run: centos-logtest
        test: centos-logtest
    spec:
      containers:
      - image: quay.io/mffiedler/ocp-logtest:latest
        imagePullPolicy: Always
        name: centos-logtest
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/svt
          name: config
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: default-dockercfg-ukomu
      nodeSelector:
        placement: logtest
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: logtest-config
        name: config

oc create -f rc.yaml

4 Wait for all the logs to be sent for the logtest pod to the default ES instance. Should take around 40 minutes.

Check the log count in the ES instance.

oc rsh elasticsearch-cdm-2rb3icfi-1-97546dfd-zplpc 

es_util --query=app*/_count -d '{"query":{"wildcard":{"kubernetes.pod_namespace":{"value":"logtesta0*","boost":1,"rewrite":"constant_score"}}}}' 

{"count":2150366,"_shards":{"total":3,"successful":3,"skipped":0,"failed":0}}

sh-4.4$ $ es_util --query=app*/_count                                                           {"count":2150366,"_shards":{"total":3,"successful":3,"skipped":0,"failed":0}}sh-4.4$

Check the count is around 2150366 while it should be 4500000. We have tested with Fluentd which works fine.

How reproducable:
Always. This issue was reported by our Performance team.

Cluster config:
Server Version: 4.10.0-0.nightly-2022-02-09-111355
Kubernetes Version: v1.23.3+759c22b

Storage: GP2
Cluster size: AWS 3 masters and 3 workers m6i.xlarge

CL instance:

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  logStore:
    type: "elasticsearch"  
    retentionPolicy: 
      application:
        maxAge: 7d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3 
      storage:
        storageClassName: "gp2" 
        size: 100G
      resources: 
          requests:
            memory: "1Gi"
      proxy: 
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"  
    kibana:
      replicas: 1
  collection:
    logs:
      type: "vector"  
      vector: {}

Attached the ES and CL instance status, collector and ES logs .

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

clo_instance.yaml
3 kB
2022/02/10 5:36 AM
es_instance.yaml
3 kB
2022/02/10 5:38 AM
elasticsearch.log
10.30 MB
2022/02/10 5:38 AM
collector.log
20.96 MB
2022/02/10 5:39 AM

Assignee:: Sachin Ninganure

Reporter:: Ishwar Kanse

QA Contact:: Ishwar Kanse

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2022/02/09 7:57 AM

Updated:: 2023/01/17 4:12 PM

Resolved:: 2023/01/03 4:11 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates