Details
-
Bug
-
Resolution: Unresolved
-
Major
-
0.10.0.Final
-
None
Description
We use a MultiSubnetted AlwaysOn SQL Server 2017 cluster. When we fail over the database's Availability Group to the Secondary node the sqlserver connector dies and doesn't reconnect
We have tried using both the transparentNetworkIPResolution, multiSubnetFailover options as well as increasing the loginTimeout value to 5 minutes all to no avail.
The stack trace is as below:
2019-12-16 01:17:14,619 ERROR || Producer failure [io.debezium.pipeline.ErrorHandler]
com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2892)
at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2045)
at com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4146)
at com.microsoft.sqlserver.jdbc.TDSWriter.writePacket(IOBuffer.java:4048)
at com.microsoft.sqlserver.jdbc.TDSWriter.endMessage(IOBuffer.java:3185)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7556)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:866)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:768)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:693)
at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:529)
at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:399)
at io.debezium.connector.sqlserver.SqlServerConnection.getMaxLsn(SqlServerConnection.java:91)
at io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource.execute(SqlServerStreamingChangeEventSource.java:116)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:91)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2040)
... 21 more
2019-12-16 01:17:14,622 INFO || Creating thread debezium-sqlserverconnector-MSSQLSERVER-error-handler [io.debezium.util.Threads]
2019-12-16 01:17:14,624 ERROR || Interrupted while stopping [io.debezium.connector.sqlserver.SqlServerConnectorTask]
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2067)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
at io.debezium.pipeline.ErrorHandler.stop(ErrorHandler.java:52)
at io.debezium.connector.sqlserver.SqlServerConnectorTask.cleanupResources(SqlServerConnectorTask.java:205)
at io.debezium.pipeline.ErrorHandler.lambda$setProducerThrowable$0(ErrorHandler.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-12-16 01:17:14,633 INFO || [Producer clientId=MSSQLSERVER-dbhistory] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. [org.apache.kafka.clients.producer.KafkaProducer]
2019-12-16 01:17:15,077 INFO || WorkerSourceTask
2019-12-16 01:17:15,078 INFO || WorkerSourceTask{id=dev-tango-db-kafka-connect-0}
flushing 0 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask]
2019-12-16 01:17:15,084 INFO || WorkerSourceTask
2019-12-16 01:17:15,085 ERROR || WorkerSourceTask{id=dev-tango-db-kafka-connect-0}
Task threw an uncaught and unrecoverable exception [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: An exception ocurred in the change event producer. This connector will be stopped.
at io.debezium.connector.base.ChangeEventQueue.throwProducerFailureIfPresent(ChangeEventQueue.java:170)
at io.debezium.connector.base.ChangeEventQueue.poll(ChangeEventQueue.java:151)
at io.debezium.connector.sqlserver.SqlServerConnectorTask.poll(SqlServerConnectorTask.java:161)
at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:245)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:221)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2892)
at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2045)
at com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4146)
at com.microsoft.sqlserver.jdbc.TDSWriter.writePacket(IOBuffer.java:4048)
at com.microsoft.sqlserver.jdbc.TDSWriter.endMessage(IOBuffer.java:3185)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7556)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:866)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:768)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:693)
at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:529)
at io.debezium.jdbc.JdbcConnection.queryAndMap(JdbcConnection.java:399)
at io.debezium.connector.sqlserver.SqlServerConnection.getMaxLsn(SqlServerConnection.java:91)
at io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource.execute(SqlServerStreamingChangeEventSource.java:116)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:91)
... 5 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2040)
... 21 more
2019-12-16 01:17:15,085 ERROR || WorkerSourceTask
Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
2019-12-16 01:17:15,085 INFO || Connector has already been stopped [io.debezium.connector.sqlserver.SqlServerConnectorTask]
2019-12-16 01:17:15,085 INFO || [Producer clientId=connector-producer-dev-tango-db-kafka-connect-0] Closing the Kafka producer with timeoutMillis = 30000 ms. [org.apache.kafka.clients.producer.KafkaProducer]
The properties we have tried are:
The base set of properties that we didn't change when we were experimenting :
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.server.name": process.env['SERVER_NAME'],
"database.hostname": process.env['DATABASE_HOST_NAME'],
"database.port": process.env['DATABASE_PORT'],
"database.user": vaultCredentials['username'],
"database.password": vaultCredentials['password'],
"database.dbname": process.env['DATABASE_NAME'],
"database.history.kafka.bootstrap.servers": process.env['BOOTSTRAP_SERVERS'],
"database.history.kafka.topic": process.env['DATABASE_HISTORY_KAFKA_TOPIC'],
"database.applicationIntent": "readwrite",
"database.applicationName": connectorName,
"database.databaseName": process.env['DATABASE_NAME'],
"table.whitelist": process.env['TABLES_WHITE_LIST'],
"sanitize.field.names": "true",
"snapshot.isolation.mode": "read_uncommitted",
"transforms": "dropPrefix",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "MSSQLSERVER.dbo.(.*)",
"transforms.dropPrefix.replacement": process.env['KAFKA_TOPIC_RENAME_REGEX']
1. Multisubnet failover was set to true
"database.multiSubnetFailover": "true"
2. Login Time out was set to 5 minutes
"database.multiSubnetFailover": "true"
"database.loginTimeout": "300"
3. Transparent Network IP Resolution
"database.transparentNetworkIPResolution": "true"
Could someone help us understand what's happening? Other JDBC applications are failing over the the slave without any issues.