Feature request or enhancement

Introduce a new configuration option to the PostgreSQL connector to allow users to treat the Replication Slot as the durable source of truth.

Which use case/requirement will be addressed by the proposed feature?

When the Debezium PostgreSQL connector starts, it compares the LSN recorded in its offset store with the LSN reported by the replication slot. Currently, if the stored offset is behind the slot (offset_lsn < slot_lsn), Debezium enforces a "Fail Fast" policy, crashing with the error: "Saved offset is before replication slot's confirmed lsn".

While this protects against data loss from re-created slots, it also forces operators to perform a full re-snapshot of the database to recover from benign scenarios where the mismatch is due to a deliberate intervention.

Treating the Slot as the Durable Source of Truth Similar to how Kafka's auto.offset.reset configuration allows consumers to opt-in to trusting the broker's position when their local state is invalid, Debezium should allow users to opt-in to trusting the PostgreSQL Replication Slot's position. If the connector's offset store is stale, the connector could then "jump ahead" to the Slot's position rather than failing.

Key Use Cases:

Respecting Manual Intervention: If an operator manually advances the slot (via pg_replication_slot_advance) to skip corrupted WAL, it should be possible to configure Debezium to respect this change instead of refusing to start. At Zalando, we make use of the ephemeral MemoryOffsetBackingStore store to allow us to do just this.

Recovering from Unmonitored WAL Advancement: In idle scenarios, DBZ-9641 allows users to opt-in to allowing non-monitored events to advance the replication slot beyond the connector's offset. Users who make use of durable OffsetBackingStores that want to use this feature will need a way to accept this new slot position without a hard failure.

Implementation ideas (optional)

Deprecate the boolean slot.seek.to.known.offset.on.start and introduce the enum offset.mismatch.strategy. This defines behavior during startStreaming when offset_lsn and slot_lsn differ.

Proposed Enum Values:

NO_VALIDATION (Default)
- Behavior: Attempts to stream from stored offset without validation.
  May fail with "WAL segment removed" error if WAL is unavailable. Slot recreation is not detected.
- Rationale: Maintains existing default behavior of slot.seek.to.known.offset.on.start = false. Provides backward compatibility.

TRUST_OFFSET
- Behavior: Validates slot position. If offset_lsn > slot_lsn, advances the slot to offset LSN using pg_replication_slot_advance(). If offset_lsn < slot_lsn, fails immediately with clear error detecting slot recreation.
- Rationale: Replaces slot.seek.to.known.offset.on.start = true. Recommended for production to detect slot recreation and prevent silent data inconsistencies.

TRUST_SLOT
- Behavior: If offset_lsn < slot_lsn, advance the Connector Offset ("jump ahead") to the slot's LSN. If offset_lsn > slot_lsn, fail.
- Rationale: Solves the "Hard Reset" problem. Allows recovery from WAL advancement by treating the database slot as the source of truth.

TRUST_GREATER_LSN
- Behavior: Automatically synchronize to max(offset_lsn, slot_lsn).
- Rationale: A "self-healing" mode for advanced users who want to recover from both crash-before-flush AND deliberate wal-advance scenarios automatically.

Code Location: Minimal changess to logic in PostgresReplicationConnection#startStreaming, that determins the starting LSN before the createReplicationStream call.

Assignee:: Unassigned

Reporter:: Conor Gallagher (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/11/18 5:54 PM

Updated:: 2025/12/11 1:31 PM

Resolved:: 2025/12/11 1:31 PM

Details

Description

Feature request or enhancement

Which use case/requirement will be addressed by the proposed feature?

Implementation ideas (optional)

Attachments

Easy Agile Planning Poker

Activity

People

Dates