Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-5855

IllegalStateException is thrown if task is recovering while other tasks are running

XMLWordPrintable

    • False
    • None
    • False

      Bug report

      We got the following in one of our SQLServer connectors:

      java.lang.IllegalStateException: Detected changed end offset of database history topic (previous: 8238, current: 8245). Make sure that the same history topic isn't shared by multiple connector instances.
        at io.debezium.relational.history.KafkaDatabaseHistory.getEndOffsetOfDbHistoryTopic(KafkaDatabaseHistory.java:369)
        at io.debezium.relational.history.KafkaDatabaseHistory.recoverRecords(KafkaDatabaseHistory.java:312)
        at io.debezium.relational.history.AbstractDatabaseHistory.recover(AbstractDatabaseHistory.java:112)
        at io.debezium.relational.history.DatabaseHistory.recover(DatabaseHistory.java:163)
        at io.debezium.relational.HistorizedRelationalDatabaseSchema.recover(HistorizedRelationalDatabaseSchema.java:62)
        at io.debezium.connector.sqlserver.SqlServerConnectorTask.start(SqlServerConnectorTask.java:87)
        at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:130)
        at io.debezium.connector.common.BaseSourceTask.startIfNeededAndPossible(BaseSourceTask.java:207)
        at io.debezium.connector.common.BaseSourceTask.poll(BaseSourceTask.java:148)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:291)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
      

      We use unique names for all our dbhistory topics. But in this case the connector has several tasks and we can restart one of them while other tasks are running.

      What Debezium connector do you use and what version?

      SQLServer, 1.9.5

      What is the connector configuration?

      class: io.debezium.connector.sqlserver.SqlServerConnector
      config:
        database.history.kafka.recovery.attempts: 8
        database.history.kafka.recovery.poll.interval.ms: 10000
        database.history.kafka.topic: ***.dbhistory
        database.names: *long list of databases*
        include.schema.changes: "false"
        max.iteration.transactions: 10000
        schema.name.adjustment.mode: none
        ...
      pause: false
      tasksMax: 3
      

      What behaviour do you expect?

      Recover procedure should finish correctly without any issue.

      What behaviour do you see?

      Exception is thrown with confusing message.

      Do you see the same behaviour using the latest relesead Debezium version?

      The code is still there.

      How to reproduce the issue using our tutorial deployment?

      Recover procedure should be rather slow (maybe with some pauses or big enough number of messages to process). At the same time the other tasks should produce message to dbhistory topic.

      Implementation ideas (optional)

      As the first step I suggest to get rid of this check:

          // The end offset should never change during recovery; doing this check here just as - a rather weak - attempt
          // to spot other connectors that share the same history topic accidentally
          if (previousEndOffset != null && !previousEndOffset.equals(endOffset)) {
               throw new IllegalStateException("Detected changed end offset of database history topic (previous: "
                  + previousEndOffset + ", current: " + endOffset
                  + "). Make sure that the same history topic isn't shared by multiple connector instances.");
          }
      

      According to the comment the check wasn't perfect at the beginning. But now when we have connectors which can work in multi task mode it leads to exceptions in valid scenarios.

      After that we may start thinking about alternative solution or in the worst case a fixed version of the same check which takes in account multi task mode.

              Unassigned Unassigned
              mikekamornikov Mikhail Kamornikau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: