Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
None
-
None
-
False
-
False
-
Undefined
Description
It seems like snapshot fails when the record size exceeds the limit. Consequently Debezium is not working properly anymore. Restarting the container is then needed and a new snapshot has to start all over again.
Setting both CONNECT_PRODUCER_MAX_REQUEST_SIZE and CONNECT_MESSAGE_MAX_BYTES to something slightly higher than our biggest message solved the issue.
- It ran with Kubernetes on Openshift and Rancher. The issue occurred on both environment.
The logs:
14/10/2020 20:40:26 org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept. 14/10/2020 20:40:26 2020-10-14 18:40:26,685 INFO || WorkerSourceTask{id=***-0} Committing offsets [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:40:26 2020-10-14 18:40:26,685 INFO || WorkerSourceTask{id=***-0} flushing 1319 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:40:28 2020-10-14 18:40:28,385 ERROR || WorkerSourceTask{id=***-0} failed to send record to ***: [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:40:28 org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept. 14/10/2020 20:40:29 2020-10-14 18:40:29,253 ERROR || WorkerSourceTask{id=***-0} failed to send record to ***: [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:40:29 org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept. 14/10/2020 20:40:31 2020-10-14 18:40:31,685 ERROR || WorkerSourceTask{id=***-0} Failed to flush, timed out while waiting for producer to flush outstanding 3 messages [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:40:31 2020-10-14 18:40:31,686 ERROR || WorkerSourceTask{id=***-0} Task threw an uncaught and unrecoverable exception [org.apache.kafka.connect.runtime.WorkerTask] 14/10/2020 20:40:31 org.apache.kafka.connect.errors.ConnectException: Unrecoverable exception from producer send callback 14/10/2020 20:40:31 at org.apache.kafka.connect.runtime.WorkerSourceTask.maybeThrowProducerSendException(WorkerSourceTask.java:282) 14/10/2020 20:40:31 at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:336) 14/10/2020 20:40:31 at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:264) 14/10/2020 20:40:31 at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185) 14/10/2020 20:40:31 at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235) 14/10/2020 20:40:31 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 14/10/2020 20:40:31 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 14/10/2020 20:40:31 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 14/10/2020 20:40:31 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 14/10/2020 20:40:31 at java.base/java.lang.Thread.run(Thread.java:834) 14/10/2020 20:40:31 Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept. 14/10/2020 20:40:31 2020-10-14 18:40:31,686 ERROR || WorkerSourceTask{id=***-0} Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask] 14/10/2020 20:40:31 2020-10-14 18:40:31,686 INFO || Stopping down connector [io.debezium.connector.common.BaseSourceTask] 14/10/2020 20:40:31 2020-10-14 18:40:31,689 WARN Postgres|staging|postgres-connector-task Snapshot was interrupted before completion [io.debezium.pipeline.source.AbstractSnapshotChangeEventSource] 14/10/2020 20:40:31 2020-10-14 18:40:31,689 INFO Postgres|staging|postgres-connector-task Snapshot - Final stage [io.debezium.pipeline.source.AbstractSnapshotChangeEventSource] 14/10/2020 20:40:31 2020-10-14 18:40:31,689 WARN Postgres|staging|postgres-connector-task Change event source executor was interrupted [io.debezium.pipeline.ChangeEventSourceCoordinator] 14/10/2020 20:40:31 java.lang.InterruptedException: Interrupted while snapshotting table *** 14/10/2020 20:40:31 at io.debezium.relational.RelationalSnapshotChangeEventSource.createDataEventsForTable(RelationalSnapshotChangeEventSource.java:329) 14/10/2020 20:40:31 at io.debezium.relational.RelationalSnapshotChangeEventSource.createDataEvents(RelationalSnapshotChangeEventSource.java:286) 14/10/2020 20:40:31 at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:138) 14/10/2020 20:40:31 at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:63) 14/10/2020 20:40:31 at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:105) 14/10/2020 20:40:31 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 14/10/2020 20:40:31 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 14/10/2020 20:40:31 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 14/10/2020 20:40:31 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 14/10/2020 20:40:31 at java.base/java.lang.Thread.run(Thread.java:834) 14/10/2020 20:40:31 2020-10-14 18:40:31,690 INFO Postgres|staging|postgres-connector-task Connected metrics set to 'false' [io.debezium.pipeline.metrics.StreamingChangeEventSourceMetrics] 14/10/2020 20:40:31 2020-10-14 18:40:31,692 INFO || [Producer clientId=connector-producer-***-0] Closing the Kafka producer with timeoutMillis = 30000 ms. [org.apache.kafka.clients.producer.KafkaProducer] 14/10/2020 20:41:02 2020-10-14 18:41:02,411 INFO || WorkerSourceTask{id=***-0} Committing offsets [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:41:02 2020-10-14 18:41:02,412 INFO || WorkerSourceTask{id=***-0} flushing 3 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:41:07 2020-10-14 18:41:07,412 ERROR || WorkerSourceTask{id=***-0} Failed to flush, timed out while waiting for producer to flush outstanding 3 messages [org.apache.kafka.connect.runtime.WorkerSourceTask] 14/10/2020 20:41:07 2020-10-14 18:41:07,412 ERROR || WorkerSourceTask{id=***-0} Failed to commit offsets [org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter]