Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-8302

Vector unable to send large size application payload to External Splunk

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Logging 6.1.z, Logging 6.0.z, Logging 6.3.z, Logging 6.4.z
    • Log Collection
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Important

      Description of problem:

      • Vector is unable to send a large log lines being logged by the application to the External Splunk destination.
      • Even after setting the `spec.tuning.maxWrite` to a smaller size, the buffer reaches to 30MB on the Splunk endpoint side.
      • Restating the collector pods or rebooting the underlying node has not helped in improving the situation.

      Version-Release number of selected component (if applicable):

      Red Hat OpenShift Logging v6

      How reproducible:

      Steps to Reproduce:

      1. Deploy an application that generates large log lines.
      2.  Install Red Hat OpenShift Logging operator v6.y.z
      3. Create a secret `splunk-secret` in the openshift-logging namespace with the correct `hecToken` from the Splunk side.
      4. Create a ServiceAccount, Bind the Cluster Role, and Add additional roles to the collector service account:
      $ oc -n openshift-logging create serviceaccount collector 
      $ oc -n openshift-logging adm policy add-cluster-role-to-user logging-collector-logs-writer -z collector
      $ oc -n openshift-logging adm policy add-cluster-role-to-user collect-application-logs -z collector  

      5. Create a ClusterLogForwarder configuration:

      apiVersion: observability.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: collector
        namespace: openshift-logging
      spec:
        managementState: Managed
        outputs:
          - name: splunk-logstore
            splunk:
              authentication:
                token:
                  key: hecToken
                  secretName: splunk-secret                            
              url: 'https://splunk-default-service.splunk-aosqe.svc:8088'
            tls:
              ca:
                key: ca-bundle.crt
                secretName: splunk-secret
            type: splunk
        pipelines:
          - inputRefs:
              - application
            name: forward-log-splunk
            outputRefs:
              - splunk-logstore
        serviceAccount:
          name: collector 

      6. Check the collector pods status in openshift-logging namespace.

      7. Check the collector pod logs.

      8. Check the error logs `Dropping malformed HEC event` at the Splunk side.

      Actual results:

      • Check the collector pod logs:
        $ oc logs <collector_pod_name> -n openshift-logging
        
        YYYY-MM-DDTHH:MM:SS.XXXXXXXZ YYYY-MM-DDTHH:MM:SS.XXXXXXXZ  WARN sink{component_kind="sink" component_id=output_splunk-logstore component_type=splunk_hec_logs}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true
        
        YYYY-MM-DDTHH:MM:SS.XXXXXXXZ YYYY-MM-DDTHH:MM:SS.XXXXXXXZ  WARN sink{component_kind="sink" component_id=output_splunk-logstore component_type=splunk_hec_logs}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true 
      • Check for the following malformed logs at Splunk side:
        ( [-]
        channel: input:HEC_OCP
        cid: w29
        cribl_cluster: HEC-OCP-Prod
        IoNane: splunk_hec
        foType: source
        level: warn
        message: Dropping malformed HEC event, enable debug to see details old_tdx: splunk_adnin size: 29390416
        snippet: ("event": ("hostnane" :"xxx-xxx.xxxx.xxx.../xxxxx...) 

      Expected results:

      • Vector should drop the extreme log lines being logged by the application before processing.
      • How can we tune the vector to be able to send all app logs to Splunk for error "dropping malformed HEC event" to manage the large payload size?

         

      Additional Info: 

              Unassigned Unassigned
              rhn-support-pripatil Prithviraj Patil
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: