-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
What Debezium connector do you use and what version?
Debezium Server 2.5.0-SNAPSHOT.
What is the connector configuration?
DEBEZIUM_SINK_TYPE: kinesis
DEBEZIUM_SINK_KINESIS_REGION: awsregion
DEBEZIUM_SOURCE_CONNECTOR_CLASS: io.debezium.connector.mysql.MySqlConnector
DEBEZIUM_SOURCE_OFFSET_STORAGE_FILE_FILENAME: data/offsets.dat
DEBEZIUM_SOURCE_OFFSET_FLUSH_INTERVAL_MS: 60000
DEBEZIUM_SOURCE_OFFSET_FLUSH_TIMEOUT_MS: 10000
DEBEZIUM_SOURCE_MAX_REQUEST_SIZE: 10485760
DEBEZIUM_SOURCE_MAX_QUEUE_SIZE: 81290
DEBEZIUM_SOURCE_MAX_BATCH_SIZE: 20480
DEBEZIUM_SOURCE_SNAPSHOT_MODE: schema_only
DEBEZIUM_SOURCE_SNAPSHOT_LOCKING_MODE: none
DEBEZIUM_SOURCE_DECIMAL_HANDLING_MODE: double
DEBEZIUM_SOURCE_DATABASE_INCLUDE_LIST: dbname
DEBEZIUM_SOURCE_TOPIC_PREFIX: dbz
DEBEZIUM_SOURCE_SCHEMA_HISTORY_INTERNAL: io.debezium.storage.file.history.FileSchemaHistory
DEBEZIUM_SOURCE_SCHEMA_HISTORY_INTERNAL_FILE_FILENAME: data/schema_history.dat
EBEZIUM_SOURCE_SCHEMA_HISTORY_INTERNAL_STORE_ONLY_CAPTURED_DATABASES_DDL: True
DEBEZIUM_SOURCE_EVENT_PROCESSING_FAILURE_HANDLING_MODE: warn
What is the captured database version and mode of depoyment?
(E.g. on-premises, with a specific cloud provider, etc.)
AWS RDS - MariaDB 10.4.26
What behaviour do you expect?
Without errors.
What behaviour do you see?
I'm running Debezium Server (2.5.0) on AWS to CDC my db (mariadb-10.4.26) to the datalake, but I'm getting a lot of deserialization errors. I checked what could be happening and found that it was always with DMLs that has a huge amount of rows, something like 100-200k. I tried to increase the request-size, queue-size and batch-size, but without success.
{ "timestamp": "2023-10-10T14:13:10.084Z", "sequence": 3594, "loggerClassName": "org.slf4j.impl.Slf4jLogger", "loggerName": "io.debezium.connector.mysql.MySqlStreamingChangeEventSource", "level": "WARN", "message": "A deserialization failure event arrived", "threadName": "blc-XXXXXX3306", "threadId": 141, "mdc":
{ "dbz.taskId": "0", "dbz.connectorName": "dbz", "dbz.connectorType": "MySQL", "dbz.connectorContext": "binlog" }, "ndc": "", "hostName": "XXXXXX", "processName": "io.debezium.server.Main", "processId": 1, "exception": { "refId": 1, "exceptionType": "com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException", "message": "Failed to deserialize data of EventHeaderV4{timestamp=1696946983000, eventType=WRITE_ROWS, serverId=592457888, headerLength=19, dataLength=8194, nextPosition=625444, flags=0}", "frames": [ { "class": "com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer", "method": "deserializeEventData", "line": 343 }, { "class": "com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer", "method": "nextEvent", "line": 246 }, { "class": "io.debezium.connector.mysql.MySqlStreamingChangeEventSource$1", "method": "nextEvent", "line": 233 }, { "class": "com.github.shyiko.mysql.binlog.BinaryLogClient", "method": "listenForEventPackets", "line": 1051 }, { "class": "com.github.shyiko.mysql.binlog.BinaryLogClient", "method": "connect", "line": 631 }, { "class": "com.github.shyiko.mysql.binlog.BinaryLogClient$7", "method": "run", "line": 932 }, { "class": "java.lang.Thread", "method": "run", "line": 829 } ], "causedBy": { "exception": { "refId": 2, "exceptionType": "com.github.shyiko.mysql.binlog.event.deserialization.MissingTableMapEventException", "message": "No TableMapEventData has been found for table id:351. Usually that means that you have started reading binary log 'within the logical event group' (e.g. from WRITE_ROWS and not proceeding TABLE_MAP", "frames": [
{ "class": "com.github.shyiko.mysql.binlog.event.deserialization.AbstractRowsEventDataDeserializer", "method": "deserializeRow", "line": 109 },
{ "class": "com.github.shyiko.mysql.binlog.event.deserialization.WriteRowsEventDataDeserializer", "method": "deserializeRows", "line": 64 },
{ "class": "com.github.shyiko.mysql.binlog.event.deserialization.WriteRowsEventDataDeserializer", "method": "deserialize", "line": 56 },
{ "class": "com.github.shyiko.mysql.binlog.event.deserialization.WriteRowsEventDataDeserializer", "method": "deserialize", "line": 32 },
{ "class": "com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer", "method": "deserializeEventData", "line": 337 },
{ "class": "com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer", "method": "nextEvent", "line": 246 },
{ "class": "io.debezium.connector.mysql.MySqlStreamingChangeEventSource$1", "method": "nextEvent", "line": 233 },
{ "class": "com.github.shyiko.mysql.binlog.BinaryLogClient", "method": "listenForEventPackets", "line": 1051 },
{ "class": "com.github.shyiko.mysql.binlog.BinaryLogClient", "method": "connect", "line": 631 },
{ "class": "com.github.shyiko.mysql.binlog.BinaryLogClient$7", "method": "run", "line": 932 },
{ "class": "java.lang.Thread", "method": "run", "line": 829 }] } } } }
{ "timestamp": "2023-10-10T14:13:10.085Z", "sequence": 3595, "loggerClassName": "org.slf4j.impl.Slf4jLogger", "loggerName": "io.debezium.connector.mysql.MySqlStreamingChangeEventSource", "level": "WARN", "message": "Error during binlog processing. Last offset stored = {transaction_id=null, ts_sec=1696946983, file=mysql-bin-changelog.580781, pos=0, server_id=592457888, event=1}, binlog reader near position = mysql-bin-changelog.580781/551527", "threadName": "XXXXXX:3306", "threadId": 141, "mdc":
{ "dbz.taskId": "0", "dbz.connectorName": "dbz", "dbz.connectorType": "MySQL", "dbz.connectorContext": "binlog" }, "ndc": "", "hostName": "XXXXXXXX", "processName": "io.debezium.server.Main", "processId": 1 }
The statement is a create or replace table. Reading the BinLog file directly it represents a 200k inserts.
CREATE OR REPLACE TABLE table AS
select distinct(id)
from example ex
where date_start >= DATE_ADD(now(),INTERVAL -45 DAY)
Do you see the same behaviour using the latest relesead Debezium version?
(Ideally, also verify with latest Alpha/Beta/CR version)
Yes, on all versions. I'm already using the latest code.
Do you have the connector logs, ideally from start till finish?
(You might be asked later to provide DEBUG/TRACE level log)
Yes.
How to reproduce the issue using our tutorial deployment?
Send a CREATE OR REPLACE TABLE statement to DB with more than 200k rows on the select clause.
CREATE OR REPLACE TABLE table AS
select distinct(id)
from example ex
where date_start >= DATE_ADD(now(),INTERVAL -45 DAY)