Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-7602

Vector always abandons a log event when syslog server got restarted

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • None
    • Log Collection
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Logging - Sprint 276, Logging - Sprint 277
    • Moderate

      Description of problem:

      Our customer found that vector always abandons a log event when syslog server got restarted.
      "Events dropped" is recorded everytime syslog server got restarted.

      2025-08-21T10:49:52.082128Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector::internal_events::socket: Error sending data. error=Transport endpoint is not connected (os error 107) error_code="socket_send" error_type="writer_failed" stage="sending" mode=tcp internal_log_rate_limit=true
      2025-08-21T10:49:52.082202Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Error sending data." internal_log_rate_limit=true
      

      Vector detects a TCP error, closes the connection, and attempts to reconnect. It's okay.
      However, a log event, which vector attempted to send with the old connection, doesn't needed to be abandonded. Because this is a retriable error.
      It should be kept and be re-sent after the reconnection.

      Version-Release number of selected component (if applicable):

      • OpenShift v4.18.9
      • cluster-logging.v6.2.2

      How reproducible:

      Always

      Steps to Reproduce:

      • Step1. Deploy rsyslog-server pod on your cluster for the test (Or, you can use external syslog server)
        apiVersion: v1
        kind: Namespace
        metadata:
          name: syslog
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          annotation:
            kubectl.kubernetes.io/default-container: "rsyslog-server" 
          name: rsyslog-server
          labels:
            app: rsyslog-server
          namespace: syslog
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: rsyslog-server
          template:
            metadata:
              labels:
                app: rsyslog-server
            spec:
              shareProcessNamespace: true
              containers:
              - name: rsyslog-server
                image: registry.redhat.io/rhel9/rsyslog:9.6-1754871563
                ports:
                - containerPort: 55514
                  name: tcp
                  protocol: TCP
                - containerPort: 55514
                  name: udp
                  protocol: UDP
                volumeMounts:
                - mountPath: /etc/rsyslog
                  name: rsyslog-conf
                - mountPath: /log
                  name: log
                command: ["/usr/sbin/rsyslogd", "-i", "/tmp/rsyslog.pid", "-n", '-f', '/etc/rsyslog/rsyslog.conf']
              - name: cleaner
                image: image-registry.openshift-image-registry.svc:5000/openshift/cli:latest
                volumeMounts:
                - mountPath: /log
                  name: log
                command: ["/bin/bash", "-c"]
                args:
                - |
                  #!/bin/bash
                  set -u
                  TOTAL_SIZE_MAX=$(expr 1024 \* 1024 \* 500)
        
                  trap 'echo "$(date): Received SIGTERM."; pkill -P 1; exit 0' TERM
                  while :
                  do
                    filenum=$(ls /log/*.log 2> /dev/null | wc -l)
        
                    if [ "$filenum" -gt 0 ]
                    then
                      rotate_size=$(expr $TOTAL_SIZE_MAX \/ $filenum)
                      for log in $(ls /log/*.log 2> /dev/null)
                      do
                        size=$(stat --printf="%s" $log)
                        [ "$size" -ge "$rotate_size" ] &&
                          echo "Rotate $log with $size bytes" &&
                          mv -f ${log} ${log}.old &&
                          pkill -SIGHUP syslog &&
                          rm -f ${log}.old.gz &&
                          gzip ${log}.old
                      done
                    fi
                    sleep 5 &
                    wait
                  done
              volumes:
              - name: rsyslog-conf
                configMap:
                  name: rsyslog-conf
              - emptyDir:
                  sizeLimit: 1Gi
                name: log
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: rsyslog-conf
          labels:
            app: rsyslog-server
          namespace: syslog
        data:
          rsyslog.conf: |
            $ModLoad imuxsock
            $SystemLogSocketName /var/lib/rsyslog/rsyslog.sock
            $ModLoad imudp
            $UDPServerRun 55514
            $ModLoad imtcp
            $InputTCPServerRun 55514
            template(name="DebugJsonFormat" type="list") {
              constant(value="{")
              property(name="timegenerated" format="jsonf")
              constant(value=",")
              property(name="timereported" format="jsonf")
              constant(value=",")
              property(name="hostname" format="jsonf")
              constant(value=",")
              property(name="syslogfacility-text" format="jsonf")
              constant(value=",")
              property(name="syslogpriority-text" format="jsonf")
              constant(value=",")
              property(name="syslogtag" format="jsonf")
              constant(value=",")
              property(name="programname" format="jsonf")
              constant(value=",")
              property(name="app-name" format="jsonf")
              constant(value=",")
              property(name="procid" format="jsonf")
              constant(value=",")
              property(name="msgid" format="jsonf")
              constant(value=",")
              property(name="msg" format="jsonf")
              constant(value="}\n")
            }
            # For fluentd
            :syslogtag, isequal, "openshift-logging-audit:" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat")
            &stop
            :syslogtag, isequal, "openshift-logging-app:" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat")
            &stop
            :syslogtag, isequal, "openshift-logging-infra:" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat")
            &stop
        
            # For vector
            :syslogtag, isequal, "audit:" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat")
            &stop
            :syslogtag, isequal, "application:" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat")
            &stop
            :syslogtag, isequal, "infrastructure:" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat")
            &stop
            :syslogtag, isequal, "eventrouter:" action(type="omfile" File="/log/openshift-eventrouter.log" Template="DebugJsonFormat")
            &stop
        
            # For fluentd
            :app-name, isequal, "openshift-logging-audit" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat")
            &stop
            :app-name, isequal, "openshift-logging-app" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat")
            &stop
            :app-name, isequal, "openshift-logging-infra" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat")
            &stop
        
            # For vector
            :app-name, isequal, "audit" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat")
            &stop
            :app-name, isequal, "application" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat")
            &stop
            :app-name, isequal, "infrastructure" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat")
            &stop
            :app-name, isequal, "eventrouter" action(type="omfile" File="/log/openshift-eventrouter.log" Template="DebugJsonFormat")
            &stop
        
            # Journal log
            action(type="omfile" File="/log/openshift-journal.log" Template="DebugJsonFormat")
            &stop
        ---
        apiVersion: v1
        kind: Service
        metadata:
          name: rsyslog-server-service
          labels:
            app: rsyslog-server
          namespace: syslog
        spec:
          type: NodePort
          ports:
          - name: udp
            port: 514
            protocol: UDP
            targetPort: 55514
            nodePort: 32514
          - name: tcp
            port: 514
            protocol: TCP
            targetPort: 55514
            nodePort: 32514
          selector:
            app: rsyslog-server
        
      • Step2. Define a CLF as to forward pod logs to your rsyslog server.
        ---
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: collector-syslog
          namespace: openshift-logging
        ---
        apiVersion: rbac.authorization.k8s.io/v1
        kind: ClusterRoleBinding
        metadata:
          name: collect-application-logs-to-collector-syslog
        roleRef:
          apiGroup: rbac.authorization.k8s.io
          kind: ClusterRole
          name: collect-application-logs
        subjects:
        - kind: ServiceAccount
          name: collector-syslog
          namespace: openshift-logging
        ---
        apiVersion: observability.openshift.io/v1
        kind: ClusterLogForwarder
        metadata:
          name: syslog
          namespace: openshift-logging
        spec:
          inputs:
          - name: test-syslog
            application:
              includes:
              - namespace: "test-syslog" 
            type: application
          collector:
            nodeSelector:
              node-role.kubernetes.io/worker: "" 
            resources:
              limits:
                cpu: 1
                memory: 2Gi
              requests:
                cpu: 100m
                memory: 64Mi
          managementState: Managed
          outputs:
          - name: external-rsyslog
            syslog:
              appName: '{.systemd.u.SYSLOG_IDENTIFIER||.log_type||"-"}'
              enrichment: KubernetesMinimal
              facility: local0
              msgId: '{.systemd.u.MESSAGE_ID||"-"}'
              payloadKey: '{.message}'
              procId: '{.systemd.t.PID||"-"}'
              rfc: RFC5424
              severity: informational
              tuning:
                deliveryMode: AtLeastOnce
              url: tcp://rsyslog-server-service.syslog.svc.cluster.local:514
            type: syslog
          pipelines:
          - inputRefs:
            - test-syslog
            name: app
            outputRefs:
            - external-rsyslog
          serviceAccount:
            name: collector-syslog
        
      • Step3. Deploy a test application which generates a message per sec.
        $ oc create namespace test-syslog
        $ oc run --image=image-registry.openshift-image-registry.svc:5000/openshift/cli -n test-syslog test-syslog -- /bin/bash -c "while :; do date; sleep 1; done" 
        
      • Step4. Check if rsyslog-server pod receives a test application's message.
        $ oc exec -n syslog deployment/rsyslog-server -c cleaner -- tail -n 1 -f /log/openshift-app-container.log
        {"timegenerated":"Aug 21 10:48:26","timereported":"Aug 21 10:48:26","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:48:26 UTC 2025"}
        
      • Step5. Send SIGTERM to the rsyslog process to restart it gracefully.
        $ oc exec  -n syslog deployment/rsyslog-server -c rsyslog-server -- /bin/bash -c "kill -SIGTERM 2" 
        
      • Step6. Check logs of vector and rsyslog.
        $ for pod in $(oc get pods -n openshift-logging -l app.kubernetes.io/component=collector --no-headers | awk '{ print $1}')
          do
            oc logs -n openshift-logging $pod | grep -B 1 "component_events_dropped" 
          done
        
        $ oc exec -n syslog deployment/rsyslog-server -- cat /log/openshift-app-container.log | less
        

      Actual results:

      "Events dropped" was recorded when rsyslog process got restarted.

      $ for pod in $(oc get pods -n openshift-logging -l app.kubernetes.io/component=collector --no-headers | awk '{ print $1}')
        do
          oc logs -n openshift-logging $pod | grep -B 1 "component_events_dropped" 
        done
      2025-08-21T10:49:52.082128Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector::internal_events::socket: Error sending data. error=Transport endpoint is not connected (os error 107) error_code="socket_send" error_type="writer_failed" stage="sending" mode=tcp internal_log_rate_limit=true
      2025-08-21T10:49:52.082202Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Error sending data." internal_log_rate_limit=true
      

      From rsyslog's log, you can notice that one record ("10:49:51" in this time) is missing.

      $ oc exec -n syslog deployment/rsyslog-server -- cat /log/openshift-app-container.log | less
      ...
      {"timegenerated":"Aug 21 10:49:49","timereported":"Aug 21 10:49:49","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:49 UTC 2025"}
      {"timegenerated":"Aug 21 10:49:51","timereported":"Aug 21 10:49:50","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:50 UTC 2025"}
      {"timegenerated":"Aug 21 10:49:53","timereported":"Aug 21 10:49:52","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:52 UTC 2025"}
      {"timegenerated":"Aug 21 10:49:54","timereported":"Aug 21 10:49:53","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:53 UTC 2025"}
      ...
      

      Expected results:

      The record "10:49:51" isn't dropped.
      It is re-sent after the reconnetion.

      Additional info:

      It looks like that "Events dropped" event is recorded by the following codes.
      https://github.com/vectordotdev/vector/blob/0f154d501201ca1d4cfa67585a6d0a9b47bb3ffc/lib/vector-stream/src/driver.rs#L207-L209

      A tcp error is a retriable error. So "EventStatus::Rejected" shouldn't be set here.
      In my opinion, it should be set only when the log was reached to the destination but rejected by the destination' side.

              vparfono Vitalii Parfonov
              kkawakam@redhat.com KATSUYA KAWAKAMI
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: