-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
None
-
None
-
Incidents & Support
-
False
-
-
False
-
NEW
-
NEW
-
Bug Fix
-
Logging - Sprint 276, Logging - Sprint 277
-
Moderate
Description of problem:
Our customer found that vector always abandons a log event when syslog server got restarted.
"Events dropped" is recorded everytime syslog server got restarted.
2025-08-21T10:49:52.082128Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector::internal_events::socket: Error sending data. error=Transport endpoint is not connected (os error 107) error_code="socket_send" error_type="writer_failed" stage="sending" mode=tcp internal_log_rate_limit=true 2025-08-21T10:49:52.082202Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Error sending data." internal_log_rate_limit=true
Vector detects a TCP error, closes the connection, and attempts to reconnect. It's okay.
However, a log event, which vector attempted to send with the old connection, doesn't needed to be abandonded. Because this is a retriable error.
It should be kept and be re-sent after the reconnection.
Version-Release number of selected component (if applicable):
- OpenShift v4.18.9
- cluster-logging.v6.2.2
How reproducible:
Always
Steps to Reproduce:
- Step1. Deploy rsyslog-server pod on your cluster for the test (Or, you can use external syslog server)
apiVersion: v1 kind: Namespace metadata: name: syslog --- apiVersion: apps/v1 kind: Deployment metadata: annotation: kubectl.kubernetes.io/default-container: "rsyslog-server" name: rsyslog-server labels: app: rsyslog-server namespace: syslog spec: replicas: 1 selector: matchLabels: app: rsyslog-server template: metadata: labels: app: rsyslog-server spec: shareProcessNamespace: true containers: - name: rsyslog-server image: registry.redhat.io/rhel9/rsyslog:9.6-1754871563 ports: - containerPort: 55514 name: tcp protocol: TCP - containerPort: 55514 name: udp protocol: UDP volumeMounts: - mountPath: /etc/rsyslog name: rsyslog-conf - mountPath: /log name: log command: ["/usr/sbin/rsyslogd", "-i", "/tmp/rsyslog.pid", "-n", '-f', '/etc/rsyslog/rsyslog.conf'] - name: cleaner image: image-registry.openshift-image-registry.svc:5000/openshift/cli:latest volumeMounts: - mountPath: /log name: log command: ["/bin/bash", "-c"] args: - | #!/bin/bash set -u TOTAL_SIZE_MAX=$(expr 1024 \* 1024 \* 500) trap 'echo "$(date): Received SIGTERM."; pkill -P 1; exit 0' TERM while : do filenum=$(ls /log/*.log 2> /dev/null | wc -l) if [ "$filenum" -gt 0 ] then rotate_size=$(expr $TOTAL_SIZE_MAX \/ $filenum) for log in $(ls /log/*.log 2> /dev/null) do size=$(stat --printf="%s" $log) [ "$size" -ge "$rotate_size" ] && echo "Rotate $log with $size bytes" && mv -f ${log} ${log}.old && pkill -SIGHUP syslog && rm -f ${log}.old.gz && gzip ${log}.old done fi sleep 5 & wait done volumes: - name: rsyslog-conf configMap: name: rsyslog-conf - emptyDir: sizeLimit: 1Gi name: log --- apiVersion: v1 kind: ConfigMap metadata: name: rsyslog-conf labels: app: rsyslog-server namespace: syslog data: rsyslog.conf: | $ModLoad imuxsock $SystemLogSocketName /var/lib/rsyslog/rsyslog.sock $ModLoad imudp $UDPServerRun 55514 $ModLoad imtcp $InputTCPServerRun 55514 template(name="DebugJsonFormat" type="list") { constant(value="{") property(name="timegenerated" format="jsonf") constant(value=",") property(name="timereported" format="jsonf") constant(value=",") property(name="hostname" format="jsonf") constant(value=",") property(name="syslogfacility-text" format="jsonf") constant(value=",") property(name="syslogpriority-text" format="jsonf") constant(value=",") property(name="syslogtag" format="jsonf") constant(value=",") property(name="programname" format="jsonf") constant(value=",") property(name="app-name" format="jsonf") constant(value=",") property(name="procid" format="jsonf") constant(value=",") property(name="msgid" format="jsonf") constant(value=",") property(name="msg" format="jsonf") constant(value="}\n") } # For fluentd :syslogtag, isequal, "openshift-logging-audit:" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat") &stop :syslogtag, isequal, "openshift-logging-app:" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat") &stop :syslogtag, isequal, "openshift-logging-infra:" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat") &stop # For vector :syslogtag, isequal, "audit:" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat") &stop :syslogtag, isequal, "application:" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat") &stop :syslogtag, isequal, "infrastructure:" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat") &stop :syslogtag, isequal, "eventrouter:" action(type="omfile" File="/log/openshift-eventrouter.log" Template="DebugJsonFormat") &stop # For fluentd :app-name, isequal, "openshift-logging-audit" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat") &stop :app-name, isequal, "openshift-logging-app" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat") &stop :app-name, isequal, "openshift-logging-infra" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat") &stop # For vector :app-name, isequal, "audit" action(type="omfile" File="/log/openshift-audit.log" Template="DebugJsonFormat") &stop :app-name, isequal, "application" action(type="omfile" File="/log/openshift-app-container.log" Template="DebugJsonFormat") &stop :app-name, isequal, "infrastructure" action(type="omfile" File="/log/openshift-infra-container.log" Template="DebugJsonFormat") &stop :app-name, isequal, "eventrouter" action(type="omfile" File="/log/openshift-eventrouter.log" Template="DebugJsonFormat") &stop # Journal log action(type="omfile" File="/log/openshift-journal.log" Template="DebugJsonFormat") &stop --- apiVersion: v1 kind: Service metadata: name: rsyslog-server-service labels: app: rsyslog-server namespace: syslog spec: type: NodePort ports: - name: udp port: 514 protocol: UDP targetPort: 55514 nodePort: 32514 - name: tcp port: 514 protocol: TCP targetPort: 55514 nodePort: 32514 selector: app: rsyslog-server
- Step2. Define a CLF as to forward pod logs to your rsyslog server.
--- apiVersion: v1 kind: ServiceAccount metadata: name: collector-syslog namespace: openshift-logging --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: collect-application-logs-to-collector-syslog roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: collect-application-logs subjects: - kind: ServiceAccount name: collector-syslog namespace: openshift-logging --- apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: syslog namespace: openshift-logging spec: inputs: - name: test-syslog application: includes: - namespace: "test-syslog" type: application collector: nodeSelector: node-role.kubernetes.io/worker: "" resources: limits: cpu: 1 memory: 2Gi requests: cpu: 100m memory: 64Mi managementState: Managed outputs: - name: external-rsyslog syslog: appName: '{.systemd.u.SYSLOG_IDENTIFIER||.log_type||"-"}' enrichment: KubernetesMinimal facility: local0 msgId: '{.systemd.u.MESSAGE_ID||"-"}' payloadKey: '{.message}' procId: '{.systemd.t.PID||"-"}' rfc: RFC5424 severity: informational tuning: deliveryMode: AtLeastOnce url: tcp://rsyslog-server-service.syslog.svc.cluster.local:514 type: syslog pipelines: - inputRefs: - test-syslog name: app outputRefs: - external-rsyslog serviceAccount: name: collector-syslog
- Step3. Deploy a test application which generates a message per sec.
$ oc create namespace test-syslog $ oc run --image=image-registry.openshift-image-registry.svc:5000/openshift/cli -n test-syslog test-syslog -- /bin/bash -c "while :; do date; sleep 1; done"
- Step4. Check if rsyslog-server pod receives a test application's message.
$ oc exec -n syslog deployment/rsyslog-server -c cleaner -- tail -n 1 -f /log/openshift-app-container.log {"timegenerated":"Aug 21 10:48:26","timereported":"Aug 21 10:48:26","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:48:26 UTC 2025"}
- Step5. Send SIGTERM to the rsyslog process to restart it gracefully.
$ oc exec -n syslog deployment/rsyslog-server -c rsyslog-server -- /bin/bash -c "kill -SIGTERM 2"
- Step6. Check logs of vector and rsyslog.
$ for pod in $(oc get pods -n openshift-logging -l app.kubernetes.io/component=collector --no-headers | awk '{ print $1}') do oc logs -n openshift-logging $pod | grep -B 1 "component_events_dropped" done $ oc exec -n syslog deployment/rsyslog-server -- cat /log/openshift-app-container.log | less
Actual results:
"Events dropped" was recorded when rsyslog process got restarted.
$ for pod in $(oc get pods -n openshift-logging -l app.kubernetes.io/component=collector --no-headers | awk '{ print $1}') do oc logs -n openshift-logging $pod | grep -B 1 "component_events_dropped" done 2025-08-21T10:49:52.082128Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector::internal_events::socket: Error sending data. error=Transport endpoint is not connected (os error 107) error_code="socket_send" error_type="writer_failed" stage="sending" mode=tcp internal_log_rate_limit=true 2025-08-21T10:49:52.082202Z ERROR sink{component_kind="sink" component_id=output_external_rsyslog component_type=socket}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Error sending data." internal_log_rate_limit=true
From rsyslog's log, you can notice that one record ("10:49:51" in this time) is missing.
$ oc exec -n syslog deployment/rsyslog-server -- cat /log/openshift-app-container.log | less ... {"timegenerated":"Aug 21 10:49:49","timereported":"Aug 21 10:49:49","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:49 UTC 2025"} {"timegenerated":"Aug 21 10:49:51","timereported":"Aug 21 10:49:50","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:50 UTC 2025"} {"timegenerated":"Aug 21 10:49:53","timereported":"Aug 21 10:49:52","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:52 UTC 2025"} {"timegenerated":"Aug 21 10:49:54","timereported":"Aug 21 10:49:53","hostname":"worker2.example.com","syslogfacility-text":"local0","syslogpriority-text":"info","syslogtag":"application","programname":"application","app-name":"application","procid":"-","msgid":"-","msg":"namespace_name=test-syslog, container_name=test-syslog, pod_name=test-syslog, message=Thu Aug 21 10:49:53 UTC 2025"} ...
Expected results:
The record "10:49:51" isn't dropped.
It is re-sent after the reconnetion.
Additional info:
It looks like that "Events dropped" event is recorded by the following codes.
https://github.com/vectordotdev/vector/blob/0f154d501201ca1d4cfa67585a6d0a9b47bb3ffc/lib/vector-stream/src/driver.rs#L207-L209
A tcp error is a retriable error. So "EventStatus::Rejected" shouldn't be set here.
In my opinion, it should be set only when the log was reached to the destination but rejected by the destination' side.
- relates to
-
LOG-7502 Vector stops of log forwarding when the TCP session is killed
-
- MODIFIED
-