Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-7616

Provide query timeout property to avoid indefinitely hangs during queries

XMLWordPrintable

    • Critical

      What Debezium connector do you use and what version?

      MySQL 2.4.0

      What is the connector configuration?

      ```

      ...
          include.schema.changes: true
          incremental.snapshot.chunk.size: 10240
      ...
          max.batch.size: 40960
          max.queue.size: 81920
          message.key.columns: redacted.my_problem_table:customer_id,field_id
          offset.flush.timeout.ms: 60000
          offset.flush.interval.ms: 10000
          read.only: true
      ...
          signal.consumer.sasl.jaas.config: "redacted"
          signal.consumer.sasl.mechanism: SCRAM-SHA-512
          signal.consumer.security.protocol: SASL_SSL
          signal.data.collection: redacted.debezium_signal
          signal.kafka.bootstrap.servers: strimzi-kafka-bootstrap.staging.redacted.com:9096
          signal.kafka.topic: debezium.sdlc-redacted-pxc.redacted.debezium_signal
      ...
          topic.creation.groups: my_problem_table
          topic.creation.my_problem_table.include: debezium.sdlc-redacted-pxc.redacted.my_problem_table
          topic.creation.my_problem_table.delete.retention.ms: 604800000 # 1 week deletion retention on tombstones
          topic.creation.my_problem_table.cleanup.policy: compact
          topic.creation.my_problem_table.partitions: 36
          topic.creation.my_problem_table.unclean.leader.election.enable: false

      ```

      What is the captured database version and mode of depoyment?

      MySQL 8.0 Percona XtraDB (PXC) cluster

      What behaviour do you expect?

      If the `SELECT * ... LIMIT 1` query takes longer than it should, the connector should fail gracefully and report the failure in the logs.

      What behaviour do you see?

      The connector seems to continue to wait indefinitely. After a couple of minutes the connector will log a Keepalive INFO log that it lost connection. The query hangs and doesn't return.

      The `SELECT` statement that comes through shows it has `message.key.columns` out of order:

      ```

      SELECT * FROM redacted.my_problem_table ORDER BY field_id DESC, customer_id DESC LIMIT 1

      ```

      No other signals make it into the topic, making the snapshot impossible to stop. I have TRACE logging enabled and there's nothing in them that might give a clue as to what the connector is trying to do. An indefinite amount of time will pass, and eventually the signals will all come through to the topic, and from then on, the latest signal will log every two hours. This doesn't seem to trigger additional snapshots, but I'm guessing it's also not expected.

      Do you see the same behaviour using the latest relesead Debezium version?

      I've tried on 2.4.0 and 2.2.0

      Do you have the connector logs, ideally from start till finish?

      ```

      {{2024-03-07 22:05:34,378 INFO [debezium.sdlc-redacted-pxc|task-0] Requested 'INCREMENTAL' snapshot of data collections '[redacted.my_problem_table]' with additional conditions '[]' and surrogate key 'PK of table will be used' (io.debezium.pipeline.signal.actions.snapshotting.ExecuteSnapshot) [blc-redacted-pxc-nomi.staging.redacted.com:3306]2024-03-07 22:05:34,379 DEBUG [debezium.sdlc-redacted-pxc|task-0] Starting new chunk with id '98e89c03-363c-487e-bf60-5f10f0e0e084' (io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotContext) [blc-redacted-pxc-nomi.staging.redacted.com:3306]2024-03-07 22:05:34,806 DEBUG [debezium.sdlc-redacted-pxc|task-0] Sending 1 records to topic debezium.sdlc-redacted-pxc.redacted.debezium_signal (io.debezium.connector.common.BaseSourceTask) [task-thread-debezium.sdlc-redacted-pxc-0]2024-03-07 22:05:34,806 INFO [debezium.sdlc-redacted-pxc|task-0] 1 records sent during previous 02:05:14.948, last recorded offset of

      {server=debezium.sdlc-redacted-pxc} partition is {transaction_id=null, ts_sec=1709838535, file=mysqld-bin.000019, pos=316, incremental_snapshot_signal_offset=null, gtids=35237b5f-cb86-11ee-bf0c-aee5d9a33803:1-34,bb70456c-c5e1-11ee-b22e-9a39132b5563:1-6, row=1, server_id=317581, event=2} (io.debezium.connector.common.BaseSourceTask) [task-thread-debezium.sdlc-redacted-pxc-0]2024-03-07 22:05:34,848 INFO [debezium.sdlc-redacted-pxc|task-0] The task will send records to topic 'debezium.sdlc-redacted-pxc.redacted.debezium_signal' for the first time. Checking whether topic exists (org.apache.kafka.connect.runtime.AbstractWorkerSourceTask) [task-thread-debezium.sdlc-redacted-pxc-0]2024-03-07 22:05:34,849 INFO [debezium.sdlc-redacted-pxc|task-0] Topic 'debezium.sdlc-redacted-pxc.redacted.debezium_signal' already exists. (org.apache.kafka.connect.runtime.AbstractWorkerSourceTask) [task-thread-debezium.sdlc-redacted-pxc-0]2024-03-07 22:06:19,907 DEBUG [debezium.sdlc-redacted-pxc|task-0|offsets] Committing offset '{server=debezium.sdlc-redacted-pxc}

      ' for partition '

      {transaction_id=null, ts_sec=1709838535, file=mysqld-bin.000019, pos=316, incremental_snapshot_signal_offset=null, gtids=35237b5f-cb86-11ee-bf0c-aee5d9a33803:1-34,bb70456c-c5e1-11ee-b22e-9a39132b5563:1-6, row=1, server_id=317581, event=2}

      ' (io.debezium.connector.common.BaseSourceTask) [SourceTaskOffsetCommitter-1]Mar 07, 2024 10:07:34 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 runINFO: Keepalive: Trying to restore lost connection to redacted-pxc-nomi.staging.redacted.com:3306}}
      ```

      How to reproduce the issue using our tutorial deployment?

      There are two issues at play. I'll open a bug issue for the second and link it here. The connector is forming its `SELECT` query putting `message.key.columns` out of order, and effectively bypassing the index the configured key creates. Here's an example schema of the problem table:

      ```

      CREATE TABLE `my_problem_table` (
        `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `item_id` int(11) unsigned NOT NULL,
        `field_id` tinyint(3) unsigned NOT NULL,
        `val` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
        `customer_id` int(10) unsigned DEFAULT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `idx_my_problem_table_coll_id_field_id` (`customer_id`,`field_id`),
        KEY `idx_coll_notes_uid_release` (`uid`,`item_id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

      ```

              rh-ee-mvitale Mario Fiore Vitale
              jonslusher Jon Slusher (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: