Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-5661

Embedded Engine or Server retrying indefinitely on all types of retriable errors

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.1.0.Alpha2
    • 2.0.0.Beta2
    • embedded-engine
    • None
    • False
    • None
    • False

      What Debezium connector do you use and what version?

      Debezium 2.0.0.Beta2

      What is the connector configuration?

      Any configuration for SQL Server but the issue would be the same for any database.

      What is the captured database version and mode of depoyment?

      SQL Server

      What behaviour do you expect?

      In Kafka Connect, the framework retries tasks based on the following props:

      errors.retry.timeout=-1
      errors.retry.delay.max.ms=X 

      I would expect the embedded engine to honour these or to provide a different mechanism for limiting or disabling retries on RetriableExceptions altogether.

      What behaviour do you see?

      It seems that for the embedded engine (and by extension, server) these settings are not considered and any RetriableException leads to an infinite loop of retries.

      Do you see the same behaviour using the latest relesead Debezium version?

      This only became a problem for us once https://issues.redhat.com/browse/DBZ-5244 was merged - some SQL Server exceptions like

      com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'cdc.lsn_time_mapping'

      didn't used to be retriable but now deemed so - and enter an infinite retry loop.

      Do you have the connector logs, ideally from start till finish?

      2022-09-28 13:11:40,863 INFO  [debezium-sqlserverconnector-Northwind_NoCDC-change-event-source-coordinator] io.debezium.pipeline.ChangeEventSourceCoordinator: Connected metrics set to 'false' 
      2022-09-28 13:11:41,078 WARN  [pool-3-thread-1] io.debezium.connector.common.BaseSourceTask: Going to restart connector after 10 sec. after a retriable exception 
      2022-09-28 13:11:41,079 INFO  [pool-64-thread-1] io.debezium.jdbc.JdbcConnection: Connection gracefully closed 
      2022-09-28 13:11:41,079 INFO  [pool-3-thread-1] io.debezium.embedded.EmbeddedEngine: Retrieable exception thrown, connector will be restarted org.apache.kafka.connect.errors.RetriableException: An exception occurred in the change event producer. This connector will be restarted. 
      

      The error repeats indefinitely.

      How to reproduce the issue using our tutorial deployment?

      For us, the easiest way to repeat the error was to point it at a SQL Server database that does not have CDC enabled (resulting in the above SQLException that is deemed retriable but in fact isn't).

      Implementation ideas (optional)

      There's probably two ways to tackle this:

      1. Either continue with the inverted logic for retriable errors (i.e. all SQLExceptions by default) but introduce a way to hard code some as non-retriable (the CDC one above would be an example)
      2. Make EmbeddedEngine honour errors.retry.timeout or perhaps a brand new prop that will at least prevent the engine from looping indefinitely 

            Unassigned Unassigned
            mark.bereznitsky Mark Bereznitsky
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: