Over at Materialize we recently realized that the LSN information provided in the PostgreSQL record metadata is insufficient to perform O(1) deduplication: https://github.com/MaterializeInc/materialize/issues/5262
With the MySQL connector, binlog offsets are monotonically increasing per file, so we can simply track the highest offset we've seen in each file. If that offset ever goes backwards, we know we're looking at a duplicate.
Unfortunately, as we recently learned, the `lsn` field in the PostgreSQL records can go backwards in the face of interleaved transactions. Only the LSN of transaction commit records are guaranteed to monotonically increase.
Would you folks be open to also including a `txn_final_lsn` field in the record metadata? This field would expose the "final LSN" of the transaction with which the record is associated, which is exposed by the logical streaming replication protocol: https://www.postgresql.org/docs/10/protocol-logicalrep-message-formats.html.
Or, is there some other metadata we could use to perform O(1) dedupe instead?