Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-1529

FileStreamSourceConnector stops when using a large file

XMLWordPrintable

    • Hide
      1. Create Kafka File Source Connector
      2. Create a large file
        seq 1 1 100000 > /tmp/source.txt
        
      3. Execute 'org.apache.kafka.connect.file.FileStreamSourceConnector'
        curl localhost:8083/connectors -X POST -H "Content-Type: application/json" -d '{"name": "source1", "config": {"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector", "file": "/tmp/source.txt", "tasks.max": "1", "topic": "test_topic"}}'
        
      4. => File Source Connector will stop after sending 10000 - 18000 messages.
      Show
      Create Kafka File Source Connector Create a large file seq 1 1 100000 > /tmp/source.txt Execute 'org.apache.kafka.connect.file.FileStreamSourceConnector' curl localhost:8083/connectors -X POST -H "Content-Type: application/json" -d '{ "name" : "source1" , "config" : { "connector.class" : "org.apache.kafka.connect.file.FileStreamSourceConnector" , "file" : "/tmp/source.txt" , "tasks.max" : "1" , "topic" : "test_topic" }}' => File Source Connector will stop after sending 10000 - 18000 messages.

      File Source Connector stops in the case of a large file

      • It seems there is a bug in the end condition(1).

      And as long as I see the source code(2), in the worst case, there is a possibility of using twice the memory of the file size for the buffer

      • So OutOfMemoryError may occur in the case of a large file.

      By the way, File Connector is not recommended for production use(3) in the document.

      (1) kafka/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceTask.java
           https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceTask.java#L130

      (2) kafka/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceTask.java
          https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceTask.java#L136-L137

      (3) Confluent > Kafka Connect FileStream Connectors
          https://docs.confluent.io/current/connect/filestream_connector.html#connect-filestreamconnector

      The Kafka Connect FileStream Connector examples are intended to show how a simple connector runs for those first getting started with Kafka Connect as either a user or developer.
      It is not recommended for production use. Instead, we encourage users to use them to learn in a local environment.
      The examples include both a file source and a file sink to demonstrate an end-to-end data flow implemented through Kafka Connect.
      

              tbentley-1 Tom Bentley
              rhn-support-tyamashi Tomonari Yamashita
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: