-
Bug
-
Resolution: Done
-
Major
-
0.5.1
-
None
The test was done to a small PG table with tens of rows. When kafka-connect starts with debezium-postgres, both postgres master and kafka-connect process each consumed around 50% CPU. When stop the connector, CPU usage is only around 1%.
The busy thread in kafka-connect process with jstack looked like below.
"records-stream-producer-thread" #41 prio=5 os_prio=0 tid=0x00007fe7b0060800 nid=0x3070 runnable [0x00007fe82885a000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAvailable(Native Method) at java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:490) - locked <0x00000000f4580380> (a java.net.SocksSocketImpl) at java.net.SocketInputStream.available(SocketInputStream.java:259) at org.postgresql.core.PGStream.hasMessagePending(PGStream.java:104) at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:987) at org.postgresql.core.v3.QueryExecutorImpl.writeToCopy(QueryExecutorImpl.java:925) - locked <0x00000000f4079968> (a org.postgresql.core.v3.QueryExecutorImpl) at org.postgresql.core.v3.CopyDualImpl.writeToCopy(CopyDualImpl.java:19) at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:175) at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:167) at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:111) at org.postgresql.core.v3.replication.V3PGReplicationStream.read(V3PGReplicationStream.java:60) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.read(PostgresReplicationConnection.java:168) at io.debezium.connector.postgresql.RecordsStreamProducer.streamChanges(RecordsStreamProducer.java:105) at io.debezium.connector.postgresql.RecordsStreamProducer.lambda$start$1(RecordsStreamProducer.java:91) at io.debezium.connector.postgresql.RecordsStreamProducer$$Lambda$123/2143981469.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
I tried to track down and found the issue could be caused by the setup of 'statusUpdateIntervalSeconds' in PostgresTaskContext.java
protected ReplicationConnection createReplicationConnection() throws SQLException { return ReplicationConnection.builder(config.jdbcConfig()) .withSlot(config.slotName()) .withPlugin(config.pluginName()) .dropSlotOnClose(config.dropSlotOnStop()) .statusUpdateIntervalSeconds(0) //never send status updates by default, they will be sent when committing the task .build(); }
A setup of 0 will cause ping to PG server crazily. The PG doc (https://jdbc.postgresql.org/documentation/head/replication.html#logical-replication) says:
It is recommended to send feedback(ping) to the database more often than configured wal_sender_timeout. In production I use value equal to wal_sender_timeout / 3. It's avoids a potential problems with networks and changes to be streamed without disconnects by timeout. To specify the feedback interval use withStatusInterval method.
So, the statusUpdateIntervalSeconds should not be 0. Please make a fix.