-
Enhancement
-
Resolution: Done
-
Major
-
None
-
False
-
-
False
Hi Debezium Team,
My name is Conor, I'm one of the Senior Principal Engineers over at Zalando. First off, thank you for your work on Debezium; it's become a key component of our event infrastructure over at Zalando.
Feature request or enhancement
Our request is to allow the pgjdbc AutomaticFlush (keepalive LSN flush) to be configurable via Debezium. In a recent PR this feature was hard-coded to disabled, and this blocks us from upgrading Debezium.
ChainedLogicalStreamBuilder streamBuilder = pgConnection()
.getReplicationAPI()
.replicationStream()
.logical()
.withSlotName(""" + slotName + """)
.withStartPosition(lsn.asLogSequenceNumber())
.withAutomaticFlush(false)
.withSlotOptions(streamParams);
Which use case/requirement will be addressed by the proposed feature?
Our Use Case: We have a Kubernetes-native eventing solution where teams can declare stream specifications that source from Postgres replication slots. Each stream specification results in a Connector Deployment (SpringBoot app) that make use of Debezium in embedded mode to stream from Postgres databases. We currently have hundreds of these streaming applications live across our 100+ Kubernetes clusters. At peak, these combined connectors process hundreds of thousands of events per second.
The Requirement: Our single biggest operational issue when we rolled out this infrastructure offering was uncontrolled WAL growth on low-activity databases, even with heartbeat configured.
Our Solution: Our team contributed a fix to pgjdbc (#2941) that uses the server's keepalive LSN to advance the replication slot, preventing WAL build-up. This logic mimics the "smart replication feedback" in psycopg2 (#913).
Results: We have run this pgjdbc fix (via version 42.7.2) in production with Debezium for nearly two years, by pinning this version for Debezium 2.7.4.Final. This fix completely eliminated our WAL growth issues, and we have processed billions of events with zero detected data loss from this mechanism.
The Blocker: Recent Debezium versions (e.g., via PR #6472) now hard-code this pgjdbc feature to disabled. This prevents us from upgrading, as we can no longer maintain our production stability by pinning to the correct pgjdbc version.
We require a way to opt-in to this proven-safe pgjdbc feature to prevent WAL growth and unblock our upgrade path.
Implementation ideas (optional)
We suggest exposing the underlying pgjdbc setting as a connector configuration option, for example: postgresql.keepalive.lsn.flush.enabled=true (defaulting to false)
We understand this feature was made configurable in pgjdbc (after our initial contribution, see pgjdbc issue #3138) to resolve potential conflicts with Debezium's own LSN handling. By adding a connector-level property, you would allow users like us—who have verified its safety at scale—to opt-in and avoid the critical WAL growth issue.
I would be open to having a call with your engineers to discuss this further if it helps.
Thanks for your consideration.