Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-3823

SQL Server connector doesn't handle retriable errors during task start

    XMLWordPrintable

Details

    Description

      While the existing error handler implements handling retriable connection errors during polling the task by Kafka Connect, the same logic doesn't apply to the task start. It means that if the underlying connection issue doesn't get fixed within retriable.restart.connector.wait.ms, the task will never recover.

      See an excerpt from the worker logs for the details of what is happening:

      [1] com.microsoft.sqlserver.jdbc.SQLServerException: SHUTDOWN is in progress.
          ...
      	at io.debezium.connector.sqlserver.SqlServerConnection.getNthTransactionLsnFromLast(SqlServerConnection.java:170)
          ...
      
      [2] Going to restart connector after 10 sec. after a retriable exception
      
      [3] org.apache.kafka.connect.errors.ConnectException: com.microsoft.sqlserver.jdbc.SQLServerException: SHUTDOWN is in progress. ClientConnectionId:8c2cd809-95ed-42ca-b8f3-886c183914b9
          ...
      Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: SHUTDOWN is in progress. ClientConnectionId:8c2cd809-95ed-42ca-b8f3-886c183914b9
          ...
      	at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2103)
      
      [4] Task is being killed and will not recover until manually restarted
      
      1. The streaming change source catches an exception from the database. The error handler parses the error message, recognizes that it's a retriable error, and converts it to a RetriableException, stores it in the queue, the task polls the queue and throws it.
      2. BaseSourceTask catches the retriable exception and restarts the task.
      3. The task restarts, attempts to connect the database, and fails because the server is still shutting down. This time, the exception is thrown by the SqlServerConnectorTask#start, not SqlServerConnectorTask#poll, so it doesn't trigger the retry logic.
      4. Kafka Connect kills the task.

      The same issue is reproducible with the example from debezium-tutorials by stopping the SQL Server instance while the connector is up.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sergeimorozov Sergei Morozov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: