-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Logging 6.1.z, Logging 6.0.z, Logging 6.3.z, Logging 6.4.z
-
Incidents & Support
-
False
-
-
False
-
NEW
-
NEW
-
Bug Fix
-
-
-
Important
Description of problem:
- Vector is unable to send a large log lines being logged by the application to the External Splunk destination.
- Even after setting the `spec.tuning.maxWrite` to a smaller size, the buffer reaches to 30MB on the Splunk endpoint side.
- Restating the collector pods or rebooting the underlying node has not helped in improving the situation.
Version-Release number of selected component (if applicable):
Red Hat OpenShift Logging v6
How reproducible:
Steps to Reproduce:
- Deploy an application that generates large log lines.
- Install Red Hat OpenShift Logging operator v6.y.z
- Create a secret `splunk-secret` in the openshift-logging namespace with the correct `hecToken` from the Splunk side.
- Create a ServiceAccount, Bind the Cluster Role, and Add additional roles to the collector service account:
$ oc -n openshift-logging create serviceaccount collector $ oc -n openshift-logging adm policy add-cluster-role-to-user logging-collector-logs-writer -z collector $ oc -n openshift-logging adm policy add-cluster-role-to-user collect-application-logs -z collector
5. Create a ClusterLogForwarder configuration:
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: collector
namespace: openshift-logging
spec:
managementState: Managed
outputs:
- name: splunk-logstore
splunk:
authentication:
token:
key: hecToken
secretName: splunk-secret
url: 'https://splunk-default-service.splunk-aosqe.svc:8088'
tls:
ca:
key: ca-bundle.crt
secretName: splunk-secret
type: splunk
pipelines:
- inputRefs:
- application
name: forward-log-splunk
outputRefs:
- splunk-logstore
serviceAccount:
name: collector
6. Check the collector pods status in openshift-logging namespace.
7. Check the collector pod logs.
8. Check the error logs `Dropping malformed HEC event` at the Splunk side.
Actual results:
- Check the collector pod logs:
$ oc logs <collector_pod_name> -n openshift-logging YYYY-MM-DDTHH:MM:SS.XXXXXXXZ YYYY-MM-DDTHH:MM:SS.XXXXXXXZ WARN sink{component_kind="sink" component_id=output_splunk-logstore component_type=splunk_hec_logs}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true YYYY-MM-DDTHH:MM:SS.XXXXXXXZ YYYY-MM-DDTHH:MM:SS.XXXXXXXZ WARN sink{component_kind="sink" component_id=output_splunk-logstore component_type=splunk_hec_logs}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true
- Check for the following malformed logs at Splunk side:
( [-] channel: input:HEC_OCP cid: w29 cribl_cluster: HEC-OCP-Prod IoNane: splunk_hec foType: source level: warn message: Dropping malformed HEC event, enable debug to see details old_tdx: splunk_adnin size: 29390416 snippet: ("event": ("hostnane" :"xxx-xxx.xxxx.xxx.../xxxxx...)
Expected results:
- Vector should drop the extreme log lines being logged by the application before processing.
- How can we tune the vector to be able to send all app logs to Splunk for error "dropping malformed HEC event" to manage the large payload size?
Additional Info:
- Here is the upstream documentation link for the reference:
[1] https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#max_merged_line_bytes - The main concern is to use "max_merged_line_bytes" parameter with the vector.
- This parameter introduces the maximum number of bytes a line can contain - after merging - before being discarded.
- This protects against malformed lines or trailing incorrect files.