Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1082

Add support for partial recovery from lost slot in postgres




      As stated in the documentation, if you lose your replication slot in postgres (such as due to a failover to a secondary) then it is possible you miss changes or you need to start from an new "initial" recovery and delete the offset state.

      It would be great if there was a way to fetch any possibly missed records, and then continue on with the slot. This should be doable by keeping track of the "catalog_xmin" from the postgres slot and then in the event of a lost slot or a slot that got into an invalid state, for each table, we can do a query using the xmin like so:

      SELECT * FROM <table> WHERE xmin::text::int > <catalog_xmin>

      To explain why that works, xmin is a system column on every table and essentially gives us the last transaction to touch every record.

      This is also used by replication slots to track when vacuuming can happen.

      So, if we use the last xmin we know we were at in the slot, on loss of a slot, we can query for any records greater than the last known xmin and will we get all inserts and updates we could have missed.

      This has one large caveat (depending on your use case), in that you can miss deletes and won't be informed of them. However, for some use cases (such in computing analytics) that can be okay.

      While we still have to scan each table, for large tables, this can be significantly less data to be re-read out of the database, but it should be clearly communicated that it can drop deletes.




            Unassigned Unassigned
            addisonj Addison Higham (Inactive)
            0 Vote for this issue
            2 Start watching this issue