Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1674

SQL Server - multisubnet failover does not work

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Backlog
    • 0.10.0.Final
    • sqlserver-connector
    • None

    Description

      We use a MultiSubnetted AlwaysOn SQL Server 2017 cluster. When we fail over the database's Availability Group to the Secondary node the sqlserver connector dies and doesn't reconnect

      We have tried using both the transparentNetworkIPResolution, multiSubnetFailover options as well as increasing the loginTimeout value to 5 minutes all to no avail.

      The stack trace is as below:
      2019-12-16 01:17:14,619 ERROR || Producer failure [io.debezium.pipeline.ErrorHandler]
      com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
      at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2892)
      at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2045)
      at com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4146)
      at com.microsoft.sqlserver.jdbc.TDSWriter.writePacket(IOBuffer.java:4048)
      at com.microsoft.sqlserver.jdbc.TDSWriter.endMessage(IOBuffer.java:3185)
      at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7556)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:866)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:768)
      at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
      at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:693)
      at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:529)
      at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:399)
      at io.debezium.connector.sqlserver.SqlServerConnection.getMaxLsn(SqlServerConnection.java:91)
      at io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource.execute(SqlServerStreamingChangeEventSource.java:116)
      at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:91)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.net.SocketException: Connection reset
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
      at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2040)
      ... 21 more
      2019-12-16 01:17:14,622 INFO || Creating thread debezium-sqlserverconnector-MSSQLSERVER-error-handler [io.debezium.util.Threads]
      2019-12-16 01:17:14,624 ERROR || Interrupted while stopping [io.debezium.connector.sqlserver.SqlServerConnectorTask]
      java.lang.InterruptedException
      at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2067)
      at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
      at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
      at io.debezium.pipeline.ErrorHandler.stop(ErrorHandler.java:52)
      at io.debezium.connector.sqlserver.SqlServerConnectorTask.cleanupResources(SqlServerConnectorTask.java:205)
      at io.debezium.pipeline.ErrorHandler.lambda$setProducerThrowable$0(ErrorHandler.java:42)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      2019-12-16 01:17:14,633 INFO || [Producer clientId=MSSQLSERVER-dbhistory] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. [org.apache.kafka.clients.producer.KafkaProducer]
      2019-12-16 01:17:15,077 INFO || WorkerSourceTask

      {id=dev-tango-db-kafka-connect-0} Committing offsets [org.apache.kafka.connect.runtime.WorkerSourceTask]
      2019-12-16 01:17:15,078 INFO || WorkerSourceTask{id=dev-tango-db-kafka-connect-0}

      flushing 0 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask]
      2019-12-16 01:17:15,084 INFO || WorkerSourceTask

      {id=dev-tango-db-kafka-connect-0} Finished commitOffsets successfully in 6 ms [org.apache.kafka.connect.runtime.WorkerSourceTask]
      2019-12-16 01:17:15,085 ERROR || WorkerSourceTask{id=dev-tango-db-kafka-connect-0}

      Task threw an uncaught and unrecoverable exception [org.apache.kafka.connect.runtime.WorkerTask]
      org.apache.kafka.connect.errors.ConnectException: An exception ocurred in the change event producer. This connector will be stopped.
      at io.debezium.connector.base.ChangeEventQueue.throwProducerFailureIfPresent(ChangeEventQueue.java:170)
      at io.debezium.connector.base.ChangeEventQueue.poll(ChangeEventQueue.java:151)
      at io.debezium.connector.sqlserver.SqlServerConnectorTask.poll(SqlServerConnectorTask.java:161)
      at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:245)
      at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:221)
      at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
      at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
      at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2892)
      at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2045)
      at com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4146)
      at com.microsoft.sqlserver.jdbc.TDSWriter.writePacket(IOBuffer.java:4048)
      at com.microsoft.sqlserver.jdbc.TDSWriter.endMessage(IOBuffer.java:3185)
      at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7556)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:866)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:768)
      at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
      at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:693)
      at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:529)
      at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:399)
      at io.debezium.connector.sqlserver.SqlServerConnection.getMaxLsn(SqlServerConnection.java:91)
      at io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource.execute(SqlServerStreamingChangeEventSource.java:116)
      at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:91)
      ... 5 more
      Caused by: java.net.SocketException: Connection reset
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
      at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2040)
      ... 21 more
      2019-12-16 01:17:15,085 ERROR || WorkerSourceTask

      {id=dev-tango-db-kafka-connect-0}

      Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
      2019-12-16 01:17:15,085 INFO || Connector has already been stopped [io.debezium.connector.sqlserver.SqlServerConnectorTask]
      2019-12-16 01:17:15,085 INFO || [Producer clientId=connector-producer-dev-tango-db-kafka-connect-0] Closing the Kafka producer with timeoutMillis = 30000 ms. [org.apache.kafka.clients.producer.KafkaProducer]

      The properties we have tried are:

      The base set of properties that we didn't change when we were experimenting :
      "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
      "tasks.max": "1",
      "database.server.name": process.env['SERVER_NAME'],
      "database.hostname": process.env['DATABASE_HOST_NAME'],
      "database.port": process.env['DATABASE_PORT'],
      "database.user": vaultCredentials['username'],
      "database.password": vaultCredentials['password'],
      "database.dbname": process.env['DATABASE_NAME'],
      "database.history.kafka.bootstrap.servers": process.env['BOOTSTRAP_SERVERS'],
      "database.history.kafka.topic": process.env['DATABASE_HISTORY_KAFKA_TOPIC'],
      "database.applicationIntent": "readwrite",
      "database.applicationName": connectorName,
      "database.databaseName": process.env['DATABASE_NAME'],
      "table.whitelist": process.env['TABLES_WHITE_LIST'],
      "sanitize.field.names": "true",
      "snapshot.isolation.mode": "read_uncommitted",
      "transforms": "dropPrefix",
      "transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
      "transforms.dropPrefix.regex": "MSSQLSERVER.dbo.(.*)",
      "transforms.dropPrefix.replacement": process.env['KAFKA_TOPIC_RENAME_REGEX']

      1. Multisubnet failover was set to true
      "database.multiSubnetFailover": "true"

      2. Login Time out was set to 5 minutes
      "database.multiSubnetFailover": "true"
      "database.loginTimeout": "300"

      3. Transparent Network IP Resolution
      "database.transparentNetworkIPResolution": "true"

      Could someone help us understand what's happening? Other JDBC applications are failing over the the slave without any issues.

      Attachments

        Activity

          People

            Unassigned Unassigned
            regsethu Sethuraman Ramasubramanian (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: