Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-7811

[release-6.3] Kafka sink: MessageSizeTooLarge

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • Logging 6.3.z
    • Logging 5.9.z, Logging 6.1.z, Logging 6.3.z
    • Log Collection
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW
    • Hide
      Before this change, request that exceeded the broker's message.max.size would be rejected because the collector's tuning did not correctly set an allowable producer configuration. This fixes that by setting the collector's kafka client configuration to allow message sizes that are MaxSize or smaller.
      Show
      Before this change, request that exceeded the broker's message.max.size would be rejected because the collector's tuning did not correctly set an allowable producer configuration. This fixes that by setting the collector's kafka client configuration to allow message sizes that are MaxSize or smaller.
    • Bug Fix
    • Logging - Sprint 278
    • Important

      Description of problem:

      When it's configured an Kafka output and the message is larger than 1MB and configured the "spec.tuning.outputs.maxWrite" option to allow to write a batch longer than 1MB like:

        - kafka:
            url: tcp://kafka.openshift-logging.svc.cluster.local:9092/clo-topic 
            authentication:
              sasl:
                mechanism: PLAIN
                password:
                  key: password
                  secretName: kafka-vector
                username:
                  key: username
                  secretName: kafka-vector
            tuning:
              maxWrite: 6M
      

      It's observed the error:

      2025-08-25T22:52:12.077585Z ERROR sink{component_kind="sink" component_id=output_kafka_app component_type=kafka}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=Some(KafkaError (Message production error: MessageSizeTooLarge (Broker: Message size too large))) request_id=4 error_type="request_failed" stage="sending" internal_log_rate_limit=true
      

      *Notes*:
      1.The kafka server accepts messages longer than 1MB
      2.If it's used Fluentd (Logging v5) instead of Vector, it works and the message arrive to Kafka

      Version-Release number of selected component (if applicable):

      Tested the issue in Logging 5.9.z, 6.1.8, 6.3.z

      How reproducible:

      Always

      Steps to Reproduce:

      1. Deploy a Kafka server that accepts messages up 10MB
      2. Configure the "clusterLogForwarder" to log forward to the Kafka server and configure the "spec.output.tuning.maxWrite" up 6MB similar to:
        [...]
        spec:
          outputs:
          - kafka:
              url: tcp://kafka.openshift-logging.svc.cluster.local:9092/clo-topic 
              authentication:
                sasl:
                  mechanism: PLAIN
                  password:
                    key: password
                    secretName: kafka-vector
                  username:
                    key: username
                    secretName: kafka-vector
              tuning:
                maxWrite: 6M
            name: kafka-app
            type: kafka
          pipelines:
          - inputRefs:
            - application
            name: test-app
            outputRefs:
            - kafka-app
        
      3. Create an application that generates a log up to 2MB
        $ oc new-project test-kafka
        $  kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname 
        $ oc -n test-kafka rsh $(oc get pods -n test-kafka -l app=hello-node -o name)
        ~ $ for i in $(seq 1 $((2 * 1024 * 1024 / 5))); do echo -n 'hello'; done > /tmp/2mb_hello_line.txt
        ~ $ du -khs /tmp/2mb_hello_line.txt 
        2.0M    /tmp/2mb_hello_line.txt
        ~ $ cat /tmp/2mb_hello_line.txt > /proc/1/fd/1
        

      Actual results:

      The collector pod running in the same node that the pod that generates logs up 2MB generates the error under this line and logs don't arrive to Kafka.

      2025-08-25T22:52:12.077585Z ERROR sink{component_kind="sink" component_id=output_kafka_app component_type=kafka}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=Some(KafkaError (Message production error: MessageSizeTooLarge (Broker: Message size too large))) request_id=4 error_type="request_failed" stage="sending" internal_log_rate_limit=true
      

      If it's used Logging v5 with Fluentd, logs are received in Kafka showing that the Kafka server accepts messages bigger than 1MB.

      Note: The tuning option "maxWrite" [0] is translated to "batch.max_bytes" [1] and this option is passed to "librdkafka_options.batch.size". If instead of using "maxWrite" is moved to "Unmanaged" the ClusterLogForwarder and set by hand "librdkafka_options."batch.size" = "3000000"", it continues failing.

      Expected results:

      Kafka sink honour the tuning option "maxWrite" and allows to log forward logs until the size set bigger than 1MB.

      Additional info:

              jcantril@redhat.com Jeffrey Cantrill
              rhn-support-ocasalsa Oscar Casal Sanchez
              Anping Li Anping Li
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated: