-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
1.2.0.Final
-
None
We have found several instances in which we found a batch of missing records in Kafka for several topics from the same database. Upon further research, we found that the missed data was right after a connector restart event. We also found that the transaction id for the missing records was precisely 1 number below the restart transaction id shown in the logs. For example:
Jul 2 02:44:13 prod-node01 docker-compose[92972]: #033[36mconnect_1 |#033[0m 2020-07-02 07:44:13,070 INFO || Found previous offset after restart PostgresOffsetContext [sourceInfo=source_info[server='foo_prod'db='foo_prod', lsn=335/E00080E8, txId=23572345, timestamp=2020-07-02T07:44:12.588330Z, snapshot=FALSE], partition={server=foo_prod}, lastSnapshotRecord=false] [io.debezium.connector.common.BaseSourceTask]
Notice txId=23572345
The missing records in Kafka all have an xmin in Postgres of 23572344, 1 number previous to this:
xmin | id | updated_at ----------+-----------+------------------------------- 23572344 | 464138218 | 2020-07-02 02:44:11.518046-05 23572344 | 464138222 | 2020-07-02 02:44:11.518046-05 23572344 | 464138225 | 2020-07-02 02:44:11.518046-05 23572344 | 464138230 | 2020-07-02 02:44:11.518046-05 23572344 | 464138231 | 2020-07-02 02:44:11.518046-05 23572344 | 464138232 | 2020-07-02 02:44:11.518046-05 23572344 | 464138233 | 2020-07-02 02:44:11.518046-05 23572344 | 464138234 | 2020-07-02 02:44:11.518046-05 23572344 | 464138235 | 2020-07-02 02:44:11.518046-05 23572344 | 464138236 | 2020-07-02 02:44:11.518046-05 23572344 | 464138237 | 2020-07-02 02:44:11.518046-05 23572344 | 464138238 | 2020-07-02 02:44:11.518046-05 23572344 | 464138239 | 2020-07-02 02:44:11.518046-05 23572344 | 464138240 | 2020-07-02 02:44:11.518046-05 23572344 | 464138241 | 2020-07-02 02:44:11.518046-05 23572344 | 464138242 | 2020-07-02 02:44:11.518046-05 23572344 | 464138243 | 2020-07-02 02:44:11.518046-05 23572344 | 464138244 | 2020-07-02 02:44:11.518046-05 23572344 | 464138245 | 2020-07-02 02:44:11.518046-05 23572344 | 464138246 | 2020-07-02 02:44:11.518046-05 23572344 | 464138247 | 2020-07-02 02:44:11.518046-05 23572344 | 464138248 | 2020-07-02 02:44:11.518046-05 23572344 | 464138249 | 2020-07-02 02:44:11.518046-05 23572344 | 464138250 | 2020-07-02 02:44:11.518046-05 23572344 | 464138251 | 2020-07-02 02:44:11.518046-05 23572344 | 464138252 | 2020-07-02 02:44:11.518046-05 23572344 | 464138253 | 2020-07-02 02:44:11.518046-05 23572344 | 464138254 | 2020-07-02 02:44:11.518046-05 23572344 | 464138255 | 2020-07-02 02:44:11.518046-05 (29 rows)
Prior to this, there was a "Producer failure" leading to restart. I suspect this might have been due to another large concurrent snapshot taking place.
Here are logs from the failure:
{"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m 2020-07-02 07:44:12,993 WARN || Going to restart connector after 10 sec. after a retriable exception [io.debezium.connector.common.BaseSourceTask]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,____||________._______[","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m 2020-07-02 07:44:12,601 INFO Postgres|foo_prod|postgres-connector-task Connected metrics set to 'false' [io.debezium.pipeline.metrics.StreamingChangeEventSourceMetrics]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,____||--______''___[...","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m 2020-07-02 07:44:12,601 INFO Postgres|foo_prod|postgres-connector-task Finished streaming [io.debezium.pipeline.ChangeEventSourceCoordinator]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,____||--______[...]","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011... 12 more","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#...__","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1072)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1109)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.PGStream.hasMessagePending(PGStream.java:139)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.PGStream.setNetworkTimeout(PGStream.java:589)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/sun.security.ssl.SSLSocketImpl.setSoTimeout(SSLSocketImpl.java:73)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/sun.security.ssl.BaseSSLSocketImpl.setSoTimeout(BaseSSLSocketImpl.java:639)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/java.net.Socket.setSoTimeout(Socket.java:1155)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m Caused by: java.net.SocketException: Socket is closed","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[__:_..:___","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/java.lang.Thread.run(Thread.java:834)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...$.(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...$.(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:101)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_....$$(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:125)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:402)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....$.(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:80)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_......(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:123)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_......(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:158)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_......(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:37)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m #011at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1074)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m org.postgresql.util.PSQLException: Database connection failed when reading from copy","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_...:_______","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}} {"preview":false,"lastrow":true,"result":{"_raw":"Jul 2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1 |#033[0m 2020-07-02 07:44:12,601 ERROR Postgres|foo_prod|postgres-connector-task Producer failure [io.debezium.pipeline.ErrorHandler]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,___||--______[...]","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
We do not know how to reproduce this. However, we can see the same exact pattern in several cases of missing data.
- relates to
-
DBZ-2288 Postgres connector may skip events during snapshot-streaming transition
- Closed