Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-2321

Single missed transaction id after connector restart

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 1.2.0.Final
    • None
    • postgresql-connector
    • None

    Description

      We have found several instances in which we found a batch of missing records in Kafka for several topics from the same database.  Upon further research, we found that the missed data was right after a connector restart event.  We also found that the transaction id for the missing records was precisely 1 number below the restart transaction id shown in the logs.  For example:

      Jul  2 02:44:13 prod-node01 docker-compose[92972]: #033[36mconnect_1  |#033[0m 2020-07-02 07:44:13,070 INFO   ||  Found previous offset after restart PostgresOffsetContext [sourceInfo=source_info[server='foo_prod'db='foo_prod', lsn=335/E00080E8, txId=23572345, timestamp=2020-07-02T07:44:12.588330Z, snapshot=FALSE], partition={server=foo_prod}, lastSnapshotRecord=false]   [io.debezium.connector.common.BaseSourceTask] 

      Notice txId=23572345

      The missing records in Kafka all have an xmin in Postgres of 23572344, 1 number previous to this:

         xmin   |    id     |          updated_at
      ----------+-----------+-------------------------------
       23572344 | 464138218 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138222 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138225 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138230 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138231 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138232 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138233 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138234 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138235 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138236 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138237 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138238 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138239 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138240 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138241 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138242 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138243 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138244 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138245 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138246 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138247 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138248 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138249 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138250 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138251 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138252 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138253 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138254 | 2020-07-02 02:44:11.518046-05
       23572344 | 464138255 | 2020-07-02 02:44:11.518046-05
      (29 rows) 

       

      Prior to this, there was a "Producer failure" leading to restart.  I suspect this might have been due to another large concurrent snapshot taking place.

      Here are logs from the failure:

      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m 2020-07-02 07:44:12,993 WARN   ||  Going to restart connector after 10 sec. after a retriable exception   [io.debezium.connector.common.BaseSourceTask]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,____||________._______[","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m 2020-07-02 07:44:12,601 INFO   Postgres|foo_prod|postgres-connector-task  Connected metrics set to 'false'   [io.debezium.pipeline.metrics.StreamingChangeEventSourceMetrics]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,____||--______''___[...","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m 2020-07-02 07:44:12,601 INFO   Postgres|foo_prod|postgres-connector-task  Finished streaming   [io.debezium.pipeline.ChangeEventSourceCoordinator]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,____||--______[...]","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011... 12 more","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#...__","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1072)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1109)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.PGStream.hasMessagePending(PGStream.java:139)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.PGStream.setNetworkTimeout(PGStream.java:589)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/sun.security.ssl.SSLSocketImpl.setSoTimeout(SSLSocketImpl.java:73)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/sun.security.ssl.BaseSSLSocketImpl.setSoTimeout(BaseSSLSocketImpl.java:639)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/java.net.Socket.setSoTimeout(Socket.java:1155)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m Caused by: java.net.SocketException: Socket is closed","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[__:_..:___","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/java.lang.Thread.run(Thread.java:834)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...$.(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_./...$.(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:101)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_....$$(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:125)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:402)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....$.(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:80)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_......(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:123)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_......(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:158)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_......(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:37)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m #011at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1074)","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_#_.....(.:)","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m org.postgresql.util.PSQLException: Database connection failed when reading from copy","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_...:_______","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
      {"preview":false,"lastrow":true,"result":{"_raw":"Jul  2 02:44:12 prod-node docker-compose[92972]: #033[36mconnect_1  |#033[0m 2020-07-02 07:44:12,601 ERROR  Postgres|foo_prod|postgres-connector-task  Producer failure   [io.debezium.pipeline.ErrorHandler]","_time":"2020-07-02T02:44:12.000-0500","date_hour":"2","date_mday":"2","date_minute":"44","date_month":"july","date_second":"12","date_wday":"thursday","date_year":"2020","date_zone":"local","host":"prod-node.example.com","index":"platform","linecount":"1","punct":"___::_-_-[]:_#[__|#[_--_::,___||--______[...]","source":"/var/log/docker-compose.log","sourcetype":"docker-compose-too_small","splunk_server":"logs.example.com","timeendpos":"16","timestartpos":"0"}}
       

      We do not know how to reproduce this.  However, we can see the same exact pattern in several cases of missing data.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jfinzel Jeremy Finzel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: