Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-2664

Snapshot fails if message size exceeds max.request.size

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • postgresql-connector
    • None
    • False
    • False
    • Undefined

    Description

      It seems like snapshot fails when the record size exceeds the limit. Consequently Debezium is not working properly anymore. Restarting the container is then needed and a new snapshot has to start all over again. 

      Setting both CONNECT_PRODUCER_MAX_REQUEST_SIZE and CONNECT_MESSAGE_MAX_BYTES to something slightly higher than our biggest message solved the issue.  

      • It ran with Kubernetes on Openshift and Rancher. The issue occurred on both environment.    

      The logs:

      14/10/2020 20:40:26 org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
      14/10/2020 20:40:26 2020-10-14 18:40:26,685 INFO   ||  WorkerSourceTask{id=***-0} Committing offsets   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:40:26 2020-10-14 18:40:26,685 INFO   ||  WorkerSourceTask{id=***-0} flushing 1319 outstanding messages for offset commit   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:40:28 2020-10-14 18:40:28,385 ERROR  ||  WorkerSourceTask{id=***-0} failed to send record to ***:    [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:40:28 org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
      14/10/2020 20:40:29 2020-10-14 18:40:29,253 ERROR  ||  WorkerSourceTask{id=***-0} failed to send record to ***:    [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:40:29 org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
      14/10/2020 20:40:31 2020-10-14 18:40:31,685 ERROR  ||  WorkerSourceTask{id=***-0} Failed to flush, timed out while waiting for producer to flush outstanding 3 messages   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:40:31 2020-10-14 18:40:31,686 ERROR  ||  WorkerSourceTask{id=***-0} Task threw an uncaught and unrecoverable exception   [org.apache.kafka.connect.runtime.WorkerTask]
      14/10/2020 20:40:31 org.apache.kafka.connect.errors.ConnectException: Unrecoverable exception from producer send callback
      14/10/2020 20:40:31    at org.apache.kafka.connect.runtime.WorkerSourceTask.maybeThrowProducerSendException(WorkerSourceTask.java:282)
      14/10/2020 20:40:31    at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:336)
      14/10/2020 20:40:31    at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:264)
      14/10/2020 20:40:31    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
      14/10/2020 20:40:31    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      14/10/2020 20:40:31    at java.base/java.lang.Thread.run(Thread.java:834)
      14/10/2020 20:40:31 Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
      14/10/2020 20:40:31 2020-10-14 18:40:31,686 ERROR  ||  WorkerSourceTask{id=***-0} Task is being killed and will not recover until manually restarted   [org.apache.kafka.connect.runtime.WorkerTask]
      14/10/2020 20:40:31 2020-10-14 18:40:31,686 INFO   ||  Stopping down connector   [io.debezium.connector.common.BaseSourceTask]
      14/10/2020 20:40:31 2020-10-14 18:40:31,689 WARN   Postgres|staging|postgres-connector-task  Snapshot was interrupted before completion   [io.debezium.pipeline.source.AbstractSnapshotChangeEventSource]
      14/10/2020 20:40:31 2020-10-14 18:40:31,689 INFO   Postgres|staging|postgres-connector-task  Snapshot - Final stage   [io.debezium.pipeline.source.AbstractSnapshotChangeEventSource]
      14/10/2020 20:40:31 2020-10-14 18:40:31,689 WARN   Postgres|staging|postgres-connector-task  Change event source executor was interrupted   [io.debezium.pipeline.ChangeEventSourceCoordinator]
      14/10/2020 20:40:31 java.lang.InterruptedException: Interrupted while snapshotting table ***
      14/10/2020 20:40:31    at io.debezium.relational.RelationalSnapshotChangeEventSource.createDataEventsForTable(RelationalSnapshotChangeEventSource.java:329)
      14/10/2020 20:40:31    at io.debezium.relational.RelationalSnapshotChangeEventSource.createDataEvents(RelationalSnapshotChangeEventSource.java:286)
      14/10/2020 20:40:31    at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:138)
      14/10/2020 20:40:31    at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:63)
      14/10/2020 20:40:31    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:105)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      14/10/2020 20:40:31    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      14/10/2020 20:40:31    at java.base/java.lang.Thread.run(Thread.java:834)
      14/10/2020 20:40:31 2020-10-14 18:40:31,690 INFO   Postgres|staging|postgres-connector-task  Connected metrics set to 'false'   [io.debezium.pipeline.metrics.StreamingChangeEventSourceMetrics]
      14/10/2020 20:40:31 2020-10-14 18:40:31,692 INFO   ||  [Producer clientId=connector-producer-***-0] Closing the Kafka producer with timeoutMillis = 30000 ms.   [org.apache.kafka.clients.producer.KafkaProducer]
      14/10/2020 20:41:02 2020-10-14 18:41:02,411 INFO   ||  WorkerSourceTask{id=***-0} Committing offsets   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:41:02 2020-10-14 18:41:02,412 INFO   ||  WorkerSourceTask{id=***-0} flushing 3 outstanding messages for offset commit   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:41:07 2020-10-14 18:41:07,412 ERROR  ||  WorkerSourceTask{id=***-0} Failed to flush, timed out while waiting for producer to flush outstanding 3 messages   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      14/10/2020 20:41:07 2020-10-14 18:41:07,412 ERROR  ||  WorkerSourceTask{id=***-0} Failed to commit offsets   [org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            roi3363 Roi Mordechay (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: