Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-250

PG connector is CPU consuming

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 0.5.1
    • 0.5.1
    • postgresql-connector
    • None

    Description

      The test was done to a small PG table with tens of rows. When kafka-connect starts with debezium-postgres, both postgres master and kafka-connect process each consumed around 50% CPU. When stop the connector, CPU usage is only around 1%.

      The busy thread in kafka-connect process with jstack looked like below.

      "records-stream-producer-thread" #41 prio=5 os_prio=0 tid=0x00007fe7b0060800 nid=0x3070 runnable [0x00007fe82885a000]
         java.lang.Thread.State: RUNNABLE
              at java.net.PlainSocketImpl.socketAvailable(Native Method)
              at java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:490)
              - locked <0x00000000f4580380> (a java.net.SocksSocketImpl)
              at java.net.SocketInputStream.available(SocketInputStream.java:259)
              at org.postgresql.core.PGStream.hasMessagePending(PGStream.java:104)
              at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:987)
              at org.postgresql.core.v3.QueryExecutorImpl.writeToCopy(QueryExecutorImpl.java:925)
              - locked <0x00000000f4079968> (a org.postgresql.core.v3.QueryExecutorImpl)
              at org.postgresql.core.v3.CopyDualImpl.writeToCopy(CopyDualImpl.java:19)
              at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:175)
              at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:167)
              at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:111)
              at org.postgresql.core.v3.replication.V3PGReplicationStream.read(V3PGReplicationStream.java:60)
              at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.read(PostgresReplicationConnection.java:168)
              at io.debezium.connector.postgresql.RecordsStreamProducer.streamChanges(RecordsStreamProducer.java:105)
              at io.debezium.connector.postgresql.RecordsStreamProducer.lambda$start$1(RecordsStreamProducer.java:91)
              at io.debezium.connector.postgresql.RecordsStreamProducer$$Lambda$123/2143981469.run(Unknown Source)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:748)
      

      I tried to track down and found the issue could be caused by the setup of 'statusUpdateIntervalSeconds' in PostgresTaskContext.java

      protected ReplicationConnection createReplicationConnection() throws SQLException {
              return ReplicationConnection.builder(config.jdbcConfig())
                                          .withSlot(config.slotName())
                                          .withPlugin(config.pluginName())
                                          .dropSlotOnClose(config.dropSlotOnStop())
                                          .statusUpdateIntervalSeconds(0) //never send status updates by default, they will be sent when committing the task
                                          .build();
          }
      

      A setup of 0 will cause ping to PG server crazily. The PG doc (https://jdbc.postgresql.org/documentation/head/replication.html#logical-replication) says:

      It is recommended to send feedback(ping) to the database more often than configured wal_sender_timeout. In production I use value equal to wal_sender_timeout / 3. It's avoids a potential problems with networks and changes to be streamed without disconnects by timeout. To specify the feedback interval use withStatusInterval method.

      So, the statusUpdateIntervalSeconds should not be 0. Please make a fix.

      Attachments

        Activity

          People

            gunnar.morling Gunnar Morling
            luwei114 W Lu (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: