Loading...

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Logging 6.1.z, Logging 6.0.z, Logging 6.3.z, Logging 6.4.z
Component/s: Log Collection
Labels:
- devel_ack+
- vector

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Vector is unable to send a large log lines being logged by the application to the External Splunk destination.
Even after setting the `spec.tuning.maxWrite` to a smaller size, the buffer reaches to 30MB on the Splunk endpoint side.
Restating the collector pods or rebooting the underlying node has not helped in improving the situation.

Version-Release number of selected component (if applicable):

Red Hat OpenShift Logging v6

How reproducible:

Steps to Reproduce:

Deploy an application that generates large log lines.
Install Red Hat OpenShift Logging operator v6.y.z
Create a secret `splunk-secret` in the openshift-logging namespace with the correct `hecToken` from the Splunk side.
Create a ServiceAccount, Bind the Cluster Role, and Add additional roles to the collector service account:

$ oc -n openshift-logging create serviceaccount collector 
$ oc -n openshift-logging adm policy add-cluster-role-to-user logging-collector-logs-writer -z collector
$ oc -n openshift-logging adm policy add-cluster-role-to-user collect-application-logs -z collector

5. Create a ClusterLogForwarder configuration:

apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: collector
  namespace: openshift-logging
spec:
  managementState: Managed
  outputs:
    - name: splunk-logstore
      splunk:
        authentication:
          token:
            key: hecToken
            secretName: splunk-secret                            
        url: 'https://splunk-default-service.splunk-aosqe.svc:8088'
      tls:
        ca:
          key: ca-bundle.crt
          secretName: splunk-secret
      type: splunk
  pipelines:
    - inputRefs:
        - application
      name: forward-log-splunk
      outputRefs:
        - splunk-logstore
  serviceAccount:
    name: collector

6. Check the collector pods status in openshift-logging namespace.

7. Check the collector pod logs.

8. Check the error logs `Dropping malformed HEC event` at the Splunk side.

Actual results:

Check the collector pod logs:

$ oc logs <collector_pod_name> -n openshift-logging

YYYY-MM-DDTHH:MM:SS.XXXXXXXZ YYYY-MM-DDTHH:MM:SS.XXXXXXXZ  WARN sink{component_kind="sink" component_id=output_splunk-logstore component_type=splunk_hec_logs}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true

YYYY-MM-DDTHH:MM:SS.XXXXXXXZ YYYY-MM-DDTHH:MM:SS.XXXXXXXZ  WARN sink{component_kind="sink" component_id=output_splunk-logstore component_type=splunk_hec_logs}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true

Check for the following malformed logs at Splunk side:

( [-]
channel: input:HEC_OCP
cid: w29
cribl_cluster: HEC-OCP-Prod
IoNane: splunk_hec
foType: source
level: warn
message: Dropping malformed HEC event, enable debug to see details old_tdx: splunk_adnin size: 29390416
snippet: ("event": ("hostnane" :"xxx-xxx.xxxx.xxx.../xxxxx...)

Expected results:

Vector should drop the extreme log lines being logged by the application before processing.
How can we tune the vector to be able to send all app logs to Splunk for error "dropping malformed HEC event" to manage the large payload size?

Additional Info:

Here is the upstream documentation link for the reference:
[1] https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#max_merged_line_bytes
The main concern is to use "max_merged_line_bytes" parameter with the vector.
This parameter introduces the maximum number of bytes a line can contain - after merging - before being discarded.
This protects against malformed lines or trailing incorrect files.

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional Info:

Attachments

Easy Agile Planning Poker

Activity

People

Dates