Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1245

Postgres connector failing because empty state data is being stored in offsets topic

    XMLWordPrintable

Details

    • Hide

      Seems to randomly happen shortly after new PG connectors using WAL2JSON are created.

      Show
      Seems to randomly happen shortly after new PG connectors using WAL2JSON are created.

    Description

      Sometimes a PG connector (using the WAL2JSON decoder) task can get into a weird state when it is restarted. I am seeing a message like this:

      {"name":"my-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"java.lang.NullPointerException\n\tat io.debezium.connector.postgresql.SourceInfo.load(SourceInfo.java:132)\n\tat io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:109)\n\tat io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:49)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:198)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"}],"type":"source"}
      

      Restarting the connector task more does not seem to help. It looks like the state data pullout off of the offsets topic is empty and there is no LSN to grab. I looked through the offsets topic and here is what the message looks:

      offset 27694: key ["my-connector",

      {"server":"cluster2"}

      ]: {}

      I can get the connector working by manually writing an LSN of 0 to the partition and restarting the connector.

      I am not sure what is causing this empty data to be written or if it is related to the recent change I made to make the heartbeat fire for all events. The empty should not be written but maybe the SourceTask error logic should be improved so that the connector falls back to getting the LSN from the slot when it cannot get the LSN from the offsets topic.

      Attachments

        Activity

          People

            jpechane Jiri Pechanec
            trolison Taylor Rolison (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: